Search CORE

549 research outputs found

Spécificités des SGBD statistiques

Author: Bry François
Diday E.
Thauront Gérard
Publication venue
Publication date: 01/01/1986
Field of study

Open Access LMU ( Ludwig-Maximilians-Univ. München)

SymScal: symbolic multidimensional scaling of interval dissimilarities

Author: Diday E.
Groenen P.J.F.
Rodriguez O.
Winsberg S.
Publication venue
Publication date
Field of study

Multidimensional scaling aims at reconstructing dissimilaritiesbetween pairs of objects by distances in a low dimensional space.However, in some cases the dissimilarity itself is unknown, but therange of the dissimilarity is given. Such fuzzy data fall in thewider class of symbolic data (Bock and Diday, 2000).Denoeux and Masson (2000) have proposed to model an intervaldissimilarity by a range of the distance defined as the minimum andmaximum distance between two rectangles representing the objects. Inthis paper, we provide a new algorithm called SymScal that is basedon iterative majorization. The advantage is that each iteration isguaranteed to improve the solution until no improvement is possible.In a simulation study, we investigate the quality of thisalgorithm. We discuss the use of SymScal on empirical dissimilarityintervals of sounds.iterative majorization;multidimensional scaling;symbolic data analysis;distance smoothing

Research Papers in Economics

SymScal: symbolic multidimensional scaling of interval dissimilarities

Author: Diday E.
Groenen P.J.F. (Patrick)
Rodriguez O.
Winsberg S.
Publication venue
Publication date: 30/03/2005
Field of study

Multidimensional scaling aims at reconstructing dissimilarities between pairs of objects by distances in a low dimensional space. However, in some cases the dissimilarity itself is unknown, but the range of the dissimilarity is given. Such fuzzy data fall in the wider class of symbolic data (Bock and Diday, 2000). Denoeux and Masson (2000) have proposed to model an interval dissimilarity by a range of the distance defined as the minimum and maximum distance between two rectangles representing the objects. In this paper, we provide a new algorithm called SymScal that is based on iterative majorization. The advantage is that each iteration is guaranteed to improve the solution until no improvement is possible. In a simulation study, we investigate the quality of this algorithm. We discuss the use of SymScal on empirical dissimilarity intervals of sounds

Base de publications de l'université Paris-Dauphine

Erasmus University Digital Repository

Dynamic Clustering of Histogram Data Based on Adaptive Squared Wasserstein Distances

Author: Ahmad
Antonio Irpino
Bock
Calinski
Calo
Celeux
Chan
Chen
Clark
Cuesta-Albertos
De Carvalho
De Carvalho
De Carvalho
De Carvalho
De Souza
Deng
Diday
Diday
Francisco de A.T. De Carvalho
Friedman
Frigui
Gibbs
Huang
Hubert
Irpino
Irpino
Jain
Jing
Johnson
Levina
Mallows
Milligan
Rosanna Verde
Rubner
Rüshendorff
Terada
Tsai
Verde
Verde
Verde
Villani
Vrac
Xu
Publication venue: 'Elsevier BV'
Publication date: 07/10/2011
Field of study

This paper deals with clustering methods based on adaptive distances for histogram data using a dynamic clustering algorithm. Histogram data describes individuals in terms of empirical distributions. These kind of data can be considered as complex descriptions of phenomena observed on complex objects: images, groups of individuals, spatial or temporal variant data, results of queries, environmental data, and so on. The Wasserstein distance is used to compare two histograms. The Wasserstein distance between histograms is constituted by two components: the first based on the means, and the second, to internal dispersions (standard deviation, skewness, kurtosis, and so on) of the histograms. To cluster sets of histogram data, we propose to use Dynamic Clustering Algorithm, (based on adaptive squared Wasserstein distances) that is a k-means-like algorithm for clustering a set of individuals into

K

classes that are apriori fixed. The main aim of this research is to provide a tool for clustering histograms, emphasizing the different contributions of the histogram variables, and their components, to the definition of the clusters. We demonstrate that this can be achieved using adaptive distances. Two kind of adaptive distances are considered: the first takes into account the variability of each component of each descriptor for the whole set of individuals; the second takes into account the variability of each component of each descriptor in each cluster. We furnish interpretative tools of the obtained partition based on an extension of the classical measures (indexes) to the use of adaptive distances in the clustering criterion function. Applications on synthetic and real-world data corroborate the proposed procedure

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

On central tendency and dispersion measures for intervals and hypercubes

Author: Bertrand P.
Bock H. H.
Bock H. H.
Bock H.-H.
Chavent M.
De Carvalho F. de A. T.
Diday E.
Jérôme Saracco
Marie Chavent
Nadler S. B. J.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2008
Field of study

The uncertainty or the variability of the data may be treated by considering, rather than a single value for each data, the interval of values in which it may fall. This paper studies the derivation of basic description statistics for interval-valued datasets. We propose a geometrical approach in the determination of summary statistics (central tendency and dispersion measures) for interval-valued variables

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL: Hyper Article en Ligne

Oskar Bordeaux

Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance

Author: A Irpino
Antonio Irpino
B Efron
CL Lawson
CL Mallows
E Diday
EAL Neto
EAL Neto
G Dall’Aglio
H Bock
J Arroyo
L Billard
L Kantorovich
L Wasserstein
M Noirhomme-Fraiture
P Bertrand
P Bickel
R Tibshirani
Rosanna Verde
WG Gilchrist
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/07/2012
Field of study

In this paper we present a linear regression model for modal symbolic data. The observed variables are histogram variables according to the definition given in the framework of Symbolic Data Analysis and the parameters of the model are estimated using the classic Least Squares method. An appropriate metric is introduced in order to measure the error between the observed and the predicted distributions. In particular, the Wasserstein distance is proposed. Some properties of such metric are exploited to predict the response variable as direct linear combination of other independent histogram variables. Measures of goodness of fit are discussed. An application on real data corroborates the proposed method

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Pyramidal Clustering Algorithms in ISO-3D Project

Author: Diday Edwin
Rodriguez Oldemar
Publication venue
Publication date: 10/01/2024
Field of study

Pyramidal clustering method generalizes hierarchies by allowing non-disjoint classes at a given level instead of a partition. Moreover, the clusters of the pyramid are intervals of a total order on the set being clustered. [Diday 1984], [Bertrand, Diday 1990] and [Mfoumoune 1998] proposed algorithms to build a pyramid starting with an arbitrary order of the individual. In this paper we present two new algorithms name {\tt CAPS} and {\tt CAPSO}. {\tt CAPSO} builds a pyramid starting with an order given on the set of the individuals (or symbolic objects) while {\tt CAPS} finds this order. These two algorithms allows moreover to cluster more complex data than the tabular model allows to process, by considering variation on the values taken by the variables, in this way, our method produces a symbolic pyramid. Each cluster thus formed is defined not only by the set of its elements (i.e. its extent) but also by a symbolic object, which describes its properties (i.e. its intent). These two algorithms were implemented in C++ and Java to the ISO-3D project.Comment: 9 page

arXiv.org e-Print Archive

Representing complex data using localized principal components with application to astronomical data

Author: A Gersho
A Gorban
AH Monaghan
AR Webb
B Chalmond
B Kégl
C Allende Prieto
CAL Bailer-Jones
CAL Bailer-Jones
DJ Marchette
E Diday
E Oja
EC Malthouse
EM Braverman
FL Hall
H Hotelling
H Späth
H Wold
IT Jolliffe
J Einbeck
J Einbeck
JH Friedman
JH Friedman
JH Friedman
JJ Verbeek
JM Chambers
K Fukunaga
K Hornik
L Breiman
MAC Perryman
MG Kendall
N Kambhatla
P Delicado
P Delicado
PG Willemsen
R Tibshirani
RJ Bolton
S de Jong
T Aluja-Banet
T Duchamps
T Hastie
T Hastie
WS Cleveland
Z-Y Liu
Publication venue
Publication date: 01/01/2007
Field of study

Often the relation between the variables constituting a multivariate data space might be characterized by one or more of the terms: ``nonlinear'', ``branched'', ``disconnected'', ``bended'', ``curved'', ``heterogeneous'', or, more general, ``complex''. In these cases, simple principal component analysis (PCA) as a tool for dimension reduction can fail badly. Of the many alternative approaches proposed so far, local approximations of PCA are among the most promising. This paper will give a short review of localized versions of PCA, focusing on local principal curves and local partitioning algorithms. Furthermore we discuss projections other than the local principal components. When performing local dimension reduction for regression or classification problems it is important to focus not only on the manifold structure of the covariates, but also on the response variable(s). Local principal components only achieve the former, whereas localized regression approaches concentrate on the latter. Local projection directions derived from the partial least squares (PLS) algorithm offer an interesting trade-off between these two objectives. We apply these methods to several real data sets. In particular, we consider simulated astrophysical data from the future Galactic survey mission Gaia.Comment: 25 pages. In "Principal Manifolds for Data Visualization and Dimension Reduction", A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (eds), Lecture Notes in Computational Science and Engineering, Springer, 2007, pp. 180--204, http://www.springer.com/dal/home/generic/search/results?SGWID=1-40109-22-173750210-

arXiv.org e-Print Archive

Durham Research Online

Crossref

Enlighten

Explore Bristol Research

On the equivalence between hierarchical segmentations and ultrametric watersheds

Author: B. Leclerc
C. Couprie
C. Mattiussi
C. Ronse
E. Diday
E. Khalimsky
F. Meyer
F. Meyer
F. Meyer
G. Bertrand
G. Bertrand
J. Benzécri
J. Cousty
J. Cousty
J. Cousty
J. Cousty
J. Cousty
J. Cousty
J. Cousty
J. Gower
J. Serra
J.B. Kruskal
J.B.T.M. Roerdink
J.P. Barthélemy
L. Guigues
L. Najman
L. Najman
L. Najman
L. Najman
Laurent Najman
M. Bender
M. Couprie
M. Krasner
M. Nagao
N. Jardine
P. Alexandroff
P. Alexandroff
P. Salembier
P. Soille
P. Soille
P.A. Arbeláez
R. Diestel
S. Johnson
T. Kong
T. Pavlidis
T. Pavlidis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

We study hierarchical segmentation in the framework of edge-weighted graphs. We define ultrametric watersheds as topological watersheds null on the minima. We prove that there exists a bijection between the set of ultrametric watersheds and the set of hierarchical segmentations. We end this paper by showing how to use the proposed framework in practice in the example of constrained connectivity; in particular it allows to compute such a hierarchy following a classical watershed-based morphological scheme, which provides an efficient algorithm to compute the whole hierarchy.Comment: 19 pages, double-colum

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL Portal de Univ. Gustave Eiffel

HAL: Hyper Article en Ligne

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Multidimensional Scaling for Interval Data: INTERSCAL

Author: Diday Edwin
Rodriguez Oldemar
Winsberg Susanne
Publication venue
Publication date: 10/01/2024
Field of study

Standard multidimensional scaling takes as input a dissimilarity matrix of general term

\delta _{ij}

which is a numerical value. In this paper we input

\delta _{ij}=[\underline{\delta _{ij}},\overline{\delta _{ij}}]

where

\underline{\delta _{ij}}

and

\overline{\delta _{ij}}

are the lower bound and the upper bound of the ``dissimilarity'' between the stimulus/object

S_i

and the stimulus/object

S_j

respectively. As output instead of representing each stimulus/object on a factorial plane by a point, as in other multidimensional scaling methods, in the proposed method each stimulus/object is visualized by a rectangle, in order to represent dissimilarity variation. We generalize the classical scaling method looking for a method that produces results similar to those obtained by Tops Principal Components Analysis. Two examples are presented to illustrate the effectiveness of the proposed method.Comment: 12 page

arXiv.org e-Print Archive