Search CORE

1,374 research outputs found

Some discussions on the Read Paper "Beyond subjective and objective in statistics" by A. Gelman and C. Hennig

Author: Celeux Gilles
Jewson Jack
Josse Julie
Marin Jean-Michel
Robert Christian
Robert Christian P.
Publication venue
Publication date: 01/01/2017
Field of study

This note is a collection of several discussions of the paper "Beyond subjective and objective in statistics", read by A. Gelman and C. Hennig to the Royal Statistical Society on April 12, 2017, and to appear in the Journal of the Royal Statistical Society, Series A

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Polytechnique

Identifiability of a Switching Markov State-Space Model

Author: Celeux Gilles
Kalawoun Jana
Pamphile Patrick
Publication venue: HAL CCSD
Publication date: 08/09/2015
Field of study

International audienceWhile switching Markov state-space models arise in many applied science applications like signal processing, bioinformatics, etc., it is often difficult to establish their identifiability which is essential for parameters estimation. This paper discusses the simple case in which the unknown continuous state and the observations are scalars. We demonstrate that if a prior information relating the observations to the unknown continuous state at a time t0 is available, and if the Markov chain is irreducible and aperiodic, the set of the model parameters will be " globally structurally identifiable ". In addition, we show that under these constraints, the model parameters can be efficiently estimated by an EM algorithm.Les modèles à espaces d'états gouvernés par une chaîne de Markov cachée sont utilisés dans de nombreux domaines appliqués comme le traitement de signal, la bioinformatique, etc. Cependant, il est souvent difficile d'établir leur identifiabilité, propriété essentielle pour l'estimation de leurs paramètres. Dans cet article, nous traitons un cas simple pour lequel l'état continu inconnu et les observations sont des scalaires. Nous démontrons que lorsque la chaîne de Markov est irréductible et apériodique , une information a priori reliant les observations et l'état continu inconnu à un instant t0 suffit pour assurer " l'identifiabilité générale " de l'ensemble des paramètres du modèle. Nous montrons aussi qu'en intégrant ces contraintes dans un algorithme EM, les paramètres du modèle sont estimés efficacement

INRIA a CCSD electronic archive server

HAL-CEA

Mixtures of Regression Models for Time-Course Gene Expression Data: Evaluation of Initialization and Random Effects

Author: Bar-Joseph
Bettina Grün
Biernacki
Celeux
Celeux
Cho
Dempster
Diebolt
Fraley
Friedrich Leisch
Grün
Handl
Hubert
Karatzoglou
Leisch
Luan
Ma
Ng
Ng
R Development Core Team
Ramoni
Scharl
Thalamuthu
Theresa Scharl
Wehrens
Publication venue
Publication date: 01/01/2009
Field of study

Finite mixture models are routinely applied to time course microarray data. Due to the complexity and size of this type of data the choice of good starting values plays an important role. So far initialization strategies have only been investigated for data from a mixture of multivariate normal distributions. In this work several initialization procedures are evaluated for mixtures of regression models with and without random effects in an extensive simulation study on different artificial datasets. Finally these procedures are also applied to a real dataset from E. coli

Crossref

Open Access LMU ( Ludwig-Maximilians-Univ. München)

Open Access Research from University of Wollongong

Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation

Author: Anbari Mohammed El
Celeux Gilles
Marin Jean-Michel
Robert Christian P.
Publication venue
Publication date: 15/11/2011
Field of study

Using a collection of simulated an real benchmarks, we compare Bayesian and frequentist regularization approaches under a low informative constraint when the number of variables is almost equal to the number of observations on simulated and real datasets. This comparison includes new global noninformative approaches for Bayesian variable selection built on Zellner's g-priors that are similar to Liang et al. (2008). The interest of those calibration-free proposals is discussed. The numerical experiments we present highlight the appeal of Bayesian regularization methods, when compared with non-Bayesian alternatives. They dominate frequentist methods in the sense that they provide smaller prediction errors while selecting the most relevant variables in a parsimonious way

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Crossref

INRIA a CCSD electronic archive server

Portail HAL Um (Université de Montpellier)

HAL: Hyper Article en Ligne

Enhancing the selection of a model-based clustering with external qualitative variables

Author: Amorim Maria José
Baudry Jean-Patrick
Cardoso Margarida
Celeux Gilles
Ferreira Ana Sousa
Publication venue
Publication date: 31/10/2012
Field of study

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a model and a number of clusters which both fit the data well and take advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Parameter Setting for Evolutionary Latent Class Clustering

Author: Biernacki Christophe
Celeux Gilles
Govaert Gérard
Schoenauer Marc
Tessier Damien
Publication venue: Springer Verlag
Publication date: 21/09/2007
Field of study

International audienceThe latent class model or multivariate multinomial mixture is a powerful model for clustering discrete data. This model is expected to be useful to represent non-homogeneous populations. It uses a conditional independence assumption given the latent class to which a statistical unit is belonging. However, it leads to a criterion that proves difficult to optimise by the standard approach based on the EM algorithm. An Evolutionary Algorithms is designed to tackle this discrete optimisation problem, and an extensive parameter study on a large artificial dataset allows to derive stable parameters. Those parameters are then validated on other artificial datasets, as well as on some well-known real data: the Evolutionary Algorithm performs repeatedly better than other standard clustering techniques on the same data

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Polytechnique

Portail HAL UNIV-RENNES

Le logiciel MIXMOD d'analyse de mélange pour la classification et l'analyse discriminante

Author: Biernacki Christophe
Celeux Gilles
Echenim Anwuli
Govaert Gérard
Langrognet Florent
Publication venue: Modulad
Publication date: 01/12/2006
Field of study

National audienceLe logiciel mixmod est dévolu à l'analyse de mélanges de lois de probabilité sur des données multidimensionnelles dans un but d'estimation de densité, de classification ou d'analyse discriminante. Il propose un choix important d'algorithmes pour estimer les paramètres d'un mélange (EM, Classification EM, Stochastic EM). Il est possible de combiner ces algorithmes de multiples façons pour obtenir un maximum local pertinent de la vraisemblance ou de la vraisemblance complétée d'un modèle. Pour des variables quantitatives, mixmod utilise des mélanges de lois normales multidimensionnelles. Il propose ainsi quatorze modèles gaussiens différents selon des hypothèses faites sur les éléments spectraux des matrices de variance des composants. Pour des variables qualitatives, mixmod utilise des mélanges de lois multinomiales multidimensionnelles sous une hypothèse d'indépendance conditionnelle des variables sachant le composant du mélange. Grâce à une reparamétrisation des probabilités multinomiales, il propose cinq modélisations différentes. Par ailleurs, différents critères d'information sont proposés pour choisir un modèle parcimonieux et permettent notamment de choisir un nombre de composants pertinents. L'emploi de l'un ou l'autre de ces critères dépend de l'objectif poursuivi (estimation de densité, classification supervisée ou non). Écrit en C++, mixmod possède des interfaces avec Scilab et Matlab. Le logiciel, sa documentation statistique et son guide d'utilisation sont disponibles à l'adresse suivante : http://www-math.univ-fcomte.fr/mixmod/index.php

HAL-uB

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

Clustering high-throughput sequencing data with Poisson mixture models

Author: Celeux Gilles
Martin-Magniette Marie-Laure
Maugis-Rabusseau Cathy
Rau Andrea
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

In recent years gene expression studies have increasingly made use of next generation sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression has flourished, primarily in the context of normalization and differential analysis. In this work, we focus on the question of clustering digital gene expression profiles as a means to discover groups of co-expressed genes. We propose two parameterizations of a Poisson mixture model to cluster expression profiles of high-throughput sequencing data. A set of simulation studies compares the performance of the proposed models with that of an approach developed for a similar type of data, namely serial analysis of gene expression. We also study the performance of these approaches on two real high-throughput sequencing data sets. The R package HTSCluster used to implement the proposed Poisson mixture models is available on CRAN.De plus en plus, les études d'expression de gènes utilisent les techniques de séquençage de nouvelle génération, entraînant une recherche grandissante sur les méthodes les plus appropriées pour l'exploitation des données digitales d'expression, à commencer pour leur normalisation et l'analyse différentielle. Ici, nous nous intéressons à la classification non supervisée des profils d'expression pour la découverte de groupes de gènes coexprimés. Nous proposons deux paramétrisations d'un modèle de mélange de Poisson pour classer des données de séquençage haut-débit. Par des simulations, nous comparons les performances de ces modèles avec des méthodes similaires conçus pour l'analyse en série de l'expression des gènes (SAGE). Nous étudions aussi les performances de ces modèles sur deux jeux de données réelles. Le package R HTSCluster associé à cette étude est disponible sur le CRAN

HAL Evry

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL-INSA Toulouse

ProdInra

Hal-Diderot