Search CORE

94 research outputs found

Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering

Author: Coretto Pietro
Hennig Christian
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2016
Field of study

The two main topics of this paper are the introduction of the "optimally tuned improper maximum likelihood estimator" (OTRIMLE) for robust clustering based on the multivariate Gaussian model for clusters, and a comprehensive simulation study comparing the OTRIMLE to Maximum Likelihood in Gaussian mixtures with and without noise component, mixtures of t-distributions, and the TCLUST approach for trimmed clustering. The OTRIMLE uses an improper constant density for modelling outliers and noise. This can be chosen optimally so that the non-noise part of the data looks as close to a Gaussian mixture as possible. Some deviation from Gaussianity can be traded in for lowering the estimated noise proportion. Covariance matrix constraints and computation of the OTRIMLE are also treated. In the simulation study, all methods are confronted with setups in which their model assumptions are not exactly fulfilled, and in order to evaluate the experiments in a standardized way by misclassification rates, a new model-based definition of "true clusters" is introduced that deviates from the usual identification of mixture components with clusters. In the study, every method turns out to be superior for one or more setups, but the OTRIMLE achieves the most satisfactory overall performance. The methods are also applied to two real datasets, one without and one with known "true" clusters

arXiv.org e-Print Archive

CiteSeerX

Crossref

UCL Discovery

Archivio della Ricerca - Università di Salerno

The Francis Crick Institute

Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering

Author: Coretto Pietro
Hennig Christian
Publication venue
Publication date: 01/01/2017
Field of study

The robust improper maximum likelihood estimator (RIMLE) is a new method for robust multivariate clustering finding approximately Gaussian clusters. It maximizes a pseudo-likelihood defined by adding a component with improper constant density for accommodating outliers to a Gaussian mixture. A special case of the RIMLE is MLE for multivariate finite Gaussian mixture models. In this paper we treat existence, consistency, and breakdown theory for the RIMLE comprehensively. RIMLE's existence is proved under non-smooth covariance matrix constraints. It is shown that these can be implemented via a computationally feasible Expectation-Conditional Maximization algorithm.Comment: The title of this paper was originally: "A consistent and breakdown robust model-based clustering method

arXiv.org e-Print Archive

UCL Discovery

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio della Ricerca - Università di Salerno

Una teoria della decidibilità: entropia e scelte in condizioni di incertezza

Author: Pietro Coretto
Publication venue
Publication date
Field of study

Questo lavoro presenta un nuovo modello di scelta in condizioni di incertezza. Dopo aver introdotto una caratterizzazione del concetto di incertezza, si dimostra, su base assiomatica, come sia possibile interpretare la funzione di entropia come misura di incertezza debole. I concetti di entropia e di utilità attesa sono impiegati per costruire su base assiomatica una nuova funzione: la funzione di decidibilità, che è in grado di ordinare le preferenze sullo spazio delle lotterie. Infine si mostra che questo modello è in grado di razionalizzare sia il paradosso di Allais che quello di Ellsberg.

Research Papers in Economics

Cluster validation by measurement of clustering characteristics relevant to the user

Author: Bowcock
Calinski
Coretto
Fang
Franck
Halkidi
Hausdorf
Hennig
Hennig
Hennig
Hennig
Hubert
Hubert
Katsnelson
Kaufman
Lago-Fernandez
Stigler
Tibshirani
Publication venue
Publication date: 01/01/2019
Field of study

There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low within-cluster distances and high between-cluster separation. In this paper, a number of validation criteria will be introduced that refer to different desirable characteristics of a clustering, and that characterise a clustering in a multidimensional way. In specific applications the user may be interested in some of these criteria rather than others. A focus of the paper is on methodology to standardise the different characteristics so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.Comment: 20 pages 2 figure

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Nonparametric consistency for maximum likelihood estimation and clustering based on mixtures of elliptically-symmetric distributions

Author: Coretto Pietro
Hennig Christian
Publication venue
Publication date: 10/11/2023
Field of study

The consistency of the maximum likelihood estimator for mixtures of elliptically-symmetric distributions for estimating its population version is shown, where the underlying distribution

P

is nonparametric and does not necessarily belong to the class of mixtures on which the estimator is based. In a situation where

P

is a mixture of well enough separated but nonparametric distributions it is shown that the components of the population version of the estimator correspond to the well separated components of

P

. This provides some theoretical justification for the use of such estimators for cluster analysis in case that

P

has well separated subpopulations even if these subpopulations differ from what the mixture model assumes

arXiv.org e-Print Archive

An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture-based clustering

Author: Coretto P
Hennig C
Publication venue: 'Wiley'
Publication date: 01/01/2022
Field of study

We introduce a new approach to deciding the number of clusters. The approach is applied to Optimally Tuned Robust Improper Maximum Likelihood Estimation (OTRIMLE; Coretto & Hennig, Journal of the American Statistical Association111, 1648-1659) of a Gaussian mixture model allowing for observations to be classified as 'noise', but it can be applied to other clustering methods as well. The quality of a clustering is assessed by a statistic Q that measures how close the within-cluster distributions are to elliptical unimodal distributions that have the only mode in the mean. This non-parametric measure allows for non-Gaussian clusters as long as they have a good quality according to Q. The simplicity of a model is assessed by a measure S that prefers a smaller number of clusters unless additional clusters can reduce the estimated noise proportion substantially. The simplest model is then chosen that is adequate for the data in the sense that its observed value of Q is not significantly larger than what is expected for data truly generated from the fitted model, as can be assessed by parametric bootstrap. The approach is compared with model-based clustering using the Bayesian information criterion (BIC) and the integrated complete likelihood (ICL) in a simulation study and on real two data sets

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

Author: Coraggio Luca
Coretto Pietro
Publication venue
Publication date: 02/05/2022
Field of study

Cluster analysis requires many decisions: the clustering method and the implied reference model, the number of clusters and, often, several hyper-parameters and algorithms' tunings. In practice, one produces several partitions, and a final one is chosen based on validation or selection criteria. There exist an abundance of validation methods that, implicitly or explicitly, assume a certain clustering notion. Moreover, they are often restricted to operate on partitions obtained from a specific method. In this paper, we focus on groups that can be well separated by quadratic or linear boundaries. The reference cluster concept is defined through the quadratic discriminant score function and parameters describing clusters' size, center and scatter. We develop two cluster-quality criteria called quadratic scores. We show that these criteria are consistent with groups generated from a general class of elliptically-symmetric distributions. The quest for this type of groups is common in applications. The connection with likelihood theory for mixture models and model-based clustering is investigated. Based on bootstrap resampling of the quadratic scores, we propose a selection rule that allows choosing among many clustering solutions. The proposed method has the distinctive advantage that it can compare partitions that cannot be compared with other state-of-the-art methods. Extensive numerical experiments and the analysis of real data show that, even if some competing methods turn out to be superior in some setups, the proposed methodology achieves a better overall performance.Comment: Supplemental materials are included at the end of the pape

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Università degli studi di Napoli Federico II

Archivio della Ricerca - Università di Salerno

A simulations study to compare robust clustering methods based on mixtures

Author: C Hennig
Christian Hennig
P Coretto
Pietro Coretto
Publication venue
Publication date: 01/01/2010
Field of study

Abstract The following mixture model-based clustering methods are compared in a simulation study with one-dimensional data, fixed number of clusters and a focus on outliers and uniform "noise": an ML-estimator (MLE) for Gaussian mixtures, an MLE for a mixture of Gaussians and a uniform distribution (interpreted as "noise component" to catch outliers), an MLE for a mixture of Gaussian distributions where a uniform distribution over the range of the data is fixed (Fraley and Raftery in Comput J 41:578-588, 199

CiteSeerX