549 research outputs found
SymScal: symbolic multidimensional scaling of interval dissimilarities
Multidimensional scaling aims at reconstructing dissimilaritiesbetween pairs of objects by distances in a low dimensional space.However, in some cases the dissimilarity itself is unknown, but therange of the dissimilarity is given. Such fuzzy data fall in thewider class of symbolic data (Bock and Diday, 2000).Denoeux and Masson (2000) have proposed to model an intervaldissimilarity by a range of the distance defined as the minimum andmaximum distance between two rectangles representing the objects. Inthis paper, we provide a new algorithm called SymScal that is basedon iterative majorization. The advantage is that each iteration isguaranteed to improve the solution until no improvement is possible.In a simulation study, we investigate the quality of thisalgorithm. We discuss the use of SymScal on empirical dissimilarityintervals of sounds.iterative majorization;multidimensional scaling;symbolic data analysis;distance smoothing
SymScal: symbolic multidimensional scaling of interval dissimilarities
Multidimensional scaling aims at reconstructing dissimilarities
between pairs of objects by distances in a low dimensional space.
However, in some cases the dissimilarity itself is unknown, but the
range of the dissimilarity is given. Such fuzzy data fall in the
wider class of symbolic data (Bock and Diday, 2000).
Denoeux and Masson (2000) have proposed to model an interval
dissimilarity by a range of the distance defined as the minimum and
maximum distance between two rectangles representing the objects. In
this paper, we provide a new algorithm called SymScal that is based
on iterative majorization. The advantage is that each iteration is
guaranteed to improve the solution until no improvement is possible.
In a simulation study, we investigate the quality of this
algorithm. We discuss the use of SymScal on empirical dissimilarity
intervals of sounds
Dynamic Clustering of Histogram Data Based on Adaptive Squared Wasserstein Distances
This paper deals with clustering methods based on adaptive distances for
histogram data using a dynamic clustering algorithm. Histogram data describes
individuals in terms of empirical distributions. These kind of data can be
considered as complex descriptions of phenomena observed on complex objects:
images, groups of individuals, spatial or temporal variant data, results of
queries, environmental data, and so on. The Wasserstein distance is used to
compare two histograms. The Wasserstein distance between histograms is
constituted by two components: the first based on the means, and the second, to
internal dispersions (standard deviation, skewness, kurtosis, and so on) of the
histograms. To cluster sets of histogram data, we propose to use Dynamic
Clustering Algorithm, (based on adaptive squared Wasserstein distances) that is
a k-means-like algorithm for clustering a set of individuals into classes
that are apriori fixed.
The main aim of this research is to provide a tool for clustering histograms,
emphasizing the different contributions of the histogram variables, and their
components, to the definition of the clusters. We demonstrate that this can be
achieved using adaptive distances. Two kind of adaptive distances are
considered: the first takes into account the variability of each component of
each descriptor for the whole set of individuals; the second takes into account
the variability of each component of each descriptor in each cluster. We
furnish interpretative tools of the obtained partition based on an extension of
the classical measures (indexes) to the use of adaptive distances in the
clustering criterion function. Applications on synthetic and real-world data
corroborate the proposed procedure
On central tendency and dispersion measures for intervals and hypercubes
The uncertainty or the variability of the data may be treated by considering,
rather than a single value for each data, the interval of values in which it
may fall. This paper studies the derivation of basic description statistics for
interval-valued datasets. We propose a geometrical approach in the
determination of summary statistics (central tendency and dispersion measures)
for interval-valued variables
Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance
In this paper we present a linear regression model for modal symbolic data.
The observed variables are histogram variables according to the definition
given in the framework of Symbolic Data Analysis and the parameters of the
model are estimated using the classic Least Squares method. An appropriate
metric is introduced in order to measure the error between the observed and the
predicted distributions. In particular, the Wasserstein distance is proposed.
Some properties of such metric are exploited to predict the response variable
as direct linear combination of other independent histogram variables. Measures
of goodness of fit are discussed. An application on real data corroborates the
proposed method
Pyramidal Clustering Algorithms in ISO-3D Project
Pyramidal clustering method generalizes hierarchies by allowing non-disjoint
classes at a given level instead of a partition. Moreover, the clusters of the
pyramid are intervals of a total order on the set being clustered. [Diday
1984], [Bertrand, Diday 1990] and [Mfoumoune 1998] proposed algorithms to build
a pyramid starting with an arbitrary order of the individual. In this paper we
present two new algorithms name {\tt CAPS} and {\tt CAPSO}. {\tt CAPSO} builds
a pyramid starting with an order given on the set of the individuals (or
symbolic objects) while {\tt CAPS} finds this order. These two algorithms
allows moreover to cluster more complex data than the tabular model allows to
process, by considering variation on the values taken by the variables, in this
way, our method produces a symbolic pyramid. Each cluster thus formed is
defined not only by the set of its elements (i.e. its extent) but also by a
symbolic object, which describes its properties (i.e. its intent). These two
algorithms were implemented in C++ and Java to the ISO-3D project.Comment: 9 page
Representing complex data using localized principal components with application to astronomical data
Often the relation between the variables constituting a multivariate data
space might be characterized by one or more of the terms: ``nonlinear'',
``branched'', ``disconnected'', ``bended'', ``curved'', ``heterogeneous'', or,
more general, ``complex''. In these cases, simple principal component analysis
(PCA) as a tool for dimension reduction can fail badly. Of the many alternative
approaches proposed so far, local approximations of PCA are among the most
promising. This paper will give a short review of localized versions of PCA,
focusing on local principal curves and local partitioning algorithms.
Furthermore we discuss projections other than the local principal components.
When performing local dimension reduction for regression or classification
problems it is important to focus not only on the manifold structure of the
covariates, but also on the response variable(s). Local principal components
only achieve the former, whereas localized regression approaches concentrate on
the latter. Local projection directions derived from the partial least squares
(PLS) algorithm offer an interesting trade-off between these two objectives. We
apply these methods to several real data sets. In particular, we consider
simulated astrophysical data from the future Galactic survey mission Gaia.Comment: 25 pages. In "Principal Manifolds for Data Visualization and
Dimension Reduction", A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (eds),
Lecture Notes in Computational Science and Engineering, Springer, 2007, pp.
180--204,
http://www.springer.com/dal/home/generic/search/results?SGWID=1-40109-22-173750210-
On the equivalence between hierarchical segmentations and ultrametric watersheds
We study hierarchical segmentation in the framework of edge-weighted graphs.
We define ultrametric watersheds as topological watersheds null on the minima.
We prove that there exists a bijection between the set of ultrametric
watersheds and the set of hierarchical segmentations. We end this paper by
showing how to use the proposed framework in practice in the example of
constrained connectivity; in particular it allows to compute such a hierarchy
following a classical watershed-based morphological scheme, which provides an
efficient algorithm to compute the whole hierarchy.Comment: 19 pages, double-colum
Multidimensional Scaling for Interval Data: INTERSCAL
Standard multidimensional scaling takes as input a dissimilarity matrix of
general term which is a numerical value. In this paper we input
where
and are the lower bound
and the upper bound of the ``dissimilarity'' between the stimulus/object
and the stimulus/object respectively. As output instead of representing
each stimulus/object on a factorial plane by a point, as in other
multidimensional scaling methods, in the proposed method each stimulus/object
is visualized by a rectangle, in order to represent dissimilarity variation. We
generalize the classical scaling method looking for a method that produces
results similar to those obtained by Tops Principal Components Analysis. Two
examples are presented to illustrate the effectiveness of the proposed method.Comment: 12 page
- …
