1,008 research outputs found
Space and Time Efficient Parallel Graph Decomposition, Clustering, and Diameter Approximation
We develop a novel parallel decomposition strategy for unweighted, undirected
graphs, based on growing disjoint connected clusters from batches of centers
progressively selected from yet uncovered nodes. With respect to similar
previous decompositions, our strategy exercises a tighter control on both the
number of clusters and their maximum radius.
We present two important applications of our parallel graph decomposition:
(1) -center clustering approximation; and (2) diameter approximation. In
both cases, we obtain algorithms which feature a polylogarithmic approximation
factor and are amenable to a distributed implementation that is geared for
massive (long-diameter) graphs. The total space needed for the computation is
linear in the problem size, and the parallel depth is substantially sublinear
in the diameter for graphs with low doubling dimension. To the best of our
knowledge, ours are the first parallel approximations for these problems which
achieve sub-diameter parallel time, for a relevant class of graphs, using only
linear space. Besides the theoretical guarantees, our algorithms allow for a
very simple implementation on clustered architectures: we report on extensive
experiments which demonstrate their effectiveness and efficiency on large
graphs as compared to alternative known approaches.Comment: 14 page
A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs
We present a space and time efficient practical parallel algorithm for
approximating the diameter of massive weighted undirected graphs on distributed
platforms supporting a MapReduce-like abstraction. The core of the algorithm is
a weighted graph decomposition strategy generating disjoint clusters of bounded
weighted radius. Theoretically, our algorithm uses linear space and yields a
polylogarithmic approximation guarantee; moreover, for important practical
classes of graphs, it runs in a number of rounds asymptotically smaller than
those required by the natural approximation provided by the state-of-the-art
-stepping SSSP algorithm, which is its only practical linear-space
competitor in the aforementioned computational scenario. We complement our
theoretical findings with an extensive experimental analysis on large benchmark
graphs, which demonstrates that our algorithm attains substantial improvements
on a number of key performance indicators with respect to the aforementioned
competitor, while featuring a similar approximation ratio (a small constant
less than 1.4, as opposed to the polylogarithmic theoretical bound)
An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets
As advances in technology allow for the collection, storage, and analysis of
vast amounts of data, the task of screening and assessing the significance of
discovered patterns is becoming a major challenge in data mining applications.
In this work, we address significance in the context of frequent itemset
mining. Specifically, we develop a novel methodology to identify a meaningful
support threshold s* for a dataset, such that the number of itemsets with
support at least s* represents a substantial deviation from what would be
expected in a random dataset with the same number of transactions and the same
individual item frequencies. These itemsets can then be flagged as
statistically significant with a small false discovery rate. We present
extensive experimental results to substantiate the effectiveness of our
methodology.Comment: A preliminary version of this work was presented in ACM PODS 2009. 20
pages, 0 figure
MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension
Given a dataset of points in a metric space and an integer , a diversity
maximization problem requires determining a subset of points maximizing
some diversity objective measure, e.g., the minimum or the average distance
between two points in the subset. Diversity maximization is computationally
hard, hence only approximate solutions can be hoped for. Although its
applications are mainly in massive data analysis, most of the past research on
diversity maximization focused on the sequential setting. In this work we
present space and pass/round-efficient diversity maximization algorithms for
the Streaming and MapReduce models and analyze their approximation guarantees
for the relevant class of metric spaces of bounded doubling dimension. Like
other approaches in the literature, our algorithms rely on the determination of
high-quality core-sets, i.e., (much) smaller subsets of the input which contain
good approximations to the optimal solution for the whole input. For a variety
of diversity objective functions, our algorithms attain an
-approximation ratio, for any constant , where
is the best approximation ratio achieved by a polynomial-time,
linear-space sequential algorithm for the same diversity objective. This
improves substantially over the approximation ratios attainable in Streaming
and MapReduce by state-of-the-art algorithms for general metric spaces. We
provide extensive experimental evidence of the effectiveness of our algorithms
on both real world and synthetic datasets, scaling up to over a billion points.Comment: Extended version of
http://www.vldb.org/pvldb/vol10/p469-ceccarello.pdf, PVLDB Volume 10, No. 5,
January 201
A new polystyrene-based ionomer/MWCNT nanocomposite for wearable skin temperature sensors
The present work outlines the fabrication and testing of a novel skin temperature sensor based on exfoliated
and undamaged multi-walled carbon nanotubes (MWCNTs) dispersed in a poly(vinylbenzyl chloride)
derivative with triethylamine (PVBC_Et3N). The dispersions were prepared by sonicating MWCNT/
PVBC_Et3N mixtures in dimethylformamide for 5 min and the quantification of the MWCNTs dispersed
was evaluated by UV–vis spectroscopy investigations and thermogravimetric analyses.
The investigations demonstrated the realization of MWCNT/PVBC_Et3N sensors with a resistance sensitivity
to temperature close to 0.004 K1, an absolute value that is comparable to the highest values
found in metals. The temperature dependence of the resistance was also found very reproducible in
the range 20–40 C, thus suggesting the possibility of using the MWCNT/PVBC_Et3N system for the fabrication
of small wearable temperature sensors for the monitoring of chronic wounds
Diabetes mellitus and ischemic heart disease. the role of ion channels
Diabetes mellitus is one the strongest risk factors for cardiovascular disease and, in particular, for ischemic heart disease (IHD). The pathophysiology of myocardial ischemia in diabetic patients is complex and not fully understood: some diabetic patients have mainly coronary stenosis obstructing blood flow to the myocardium; others present with coronary microvascular disease with an absence of plaques in the epicardial vessels. Ion channels acting in the cross-talk between the myocardial energy state and coronary blood flow may play a role in the pathophysiology of IHD in diabetic patients. In particular, some genetic variants for ATP-dependent potassium channels seem to be involved in the determinism of IH
Accurate MapReduce Algorithms for k-Median and k-Means in General Metric Spaces
Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular k-median and k-means variants which, given a set P of points from a metric space and a parameter k<|P|, require to identify a set S of k centers minimizing, respectively, the sum of the distances and of the squared distances of all points in P from their closest centers. Our specific focus is on general metric spaces, for which it is reasonable to require that the centers belong to the input set (i.e., S subseteq P). We present coreset-based 3-round distributed approximation algorithms for the above problems using the MapReduce computational model. The algorithms are rather simple and obliviously adapt to the intrinsic complexity of the dataset, captured by the doubling dimension D of the metric space. Remarkably, the algorithms attain approximation ratios that can be made arbitrarily close to those achievable by the best known polynomial-time sequential approximations, and they are very space efficient for small D, requiring local memory sizes substantially sublinear in the input size. To the best of our knowledge, no previous distributed approaches were able to attain similar quality-performance guarantees in general metric spaces
- …
