1,008 research outputs found

    Space and Time Efficient Parallel Graph Decomposition, Clustering, and Diameter Approximation

    Full text link
    We develop a novel parallel decomposition strategy for unweighted, undirected graphs, based on growing disjoint connected clusters from batches of centers progressively selected from yet uncovered nodes. With respect to similar previous decompositions, our strategy exercises a tighter control on both the number of clusters and their maximum radius. We present two important applications of our parallel graph decomposition: (1) kk-center clustering approximation; and (2) diameter approximation. In both cases, we obtain algorithms which feature a polylogarithmic approximation factor and are amenable to a distributed implementation that is geared for massive (long-diameter) graphs. The total space needed for the computation is linear in the problem size, and the parallel depth is substantially sublinear in the diameter for graphs with low doubling dimension. To the best of our knowledge, ours are the first parallel approximations for these problems which achieve sub-diameter parallel time, for a relevant class of graphs, using only linear space. Besides the theoretical guarantees, our algorithms allow for a very simple implementation on clustered architectures: we report on extensive experiments which demonstrate their effectiveness and efficiency on large graphs as compared to alternative known approaches.Comment: 14 page

    A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs

    Full text link
    We present a space and time efficient practical parallel algorithm for approximating the diameter of massive weighted undirected graphs on distributed platforms supporting a MapReduce-like abstraction. The core of the algorithm is a weighted graph decomposition strategy generating disjoint clusters of bounded weighted radius. Theoretically, our algorithm uses linear space and yields a polylogarithmic approximation guarantee; moreover, for important practical classes of graphs, it runs in a number of rounds asymptotically smaller than those required by the natural approximation provided by the state-of-the-art Δ\Delta-stepping SSSP algorithm, which is its only practical linear-space competitor in the aforementioned computational scenario. We complement our theoretical findings with an extensive experimental analysis on large benchmark graphs, which demonstrates that our algorithm attains substantial improvements on a number of key performance indicators with respect to the aforementioned competitor, while featuring a similar approximation ratio (a small constant less than 1.4, as opposed to the polylogarithmic theoretical bound)

    An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

    Full text link
    As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. We present extensive experimental results to substantiate the effectiveness of our methodology.Comment: A preliminary version of this work was presented in ACM PODS 2009. 20 pages, 0 figure

    MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension

    Get PDF
    Given a dataset of points in a metric space and an integer kk, a diversity maximization problem requires determining a subset of kk points maximizing some diversity objective measure, e.g., the minimum or the average distance between two points in the subset. Diversity maximization is computationally hard, hence only approximate solutions can be hoped for. Although its applications are mainly in massive data analysis, most of the past research on diversity maximization focused on the sequential setting. In this work we present space and pass/round-efficient diversity maximization algorithms for the Streaming and MapReduce models and analyze their approximation guarantees for the relevant class of metric spaces of bounded doubling dimension. Like other approaches in the literature, our algorithms rely on the determination of high-quality core-sets, i.e., (much) smaller subsets of the input which contain good approximations to the optimal solution for the whole input. For a variety of diversity objective functions, our algorithms attain an (α+ϵ)(\alpha+\epsilon)-approximation ratio, for any constant ϵ>0\epsilon>0, where α\alpha is the best approximation ratio achieved by a polynomial-time, linear-space sequential algorithm for the same diversity objective. This improves substantially over the approximation ratios attainable in Streaming and MapReduce by state-of-the-art algorithms for general metric spaces. We provide extensive experimental evidence of the effectiveness of our algorithms on both real world and synthetic datasets, scaling up to over a billion points.Comment: Extended version of http://www.vldb.org/pvldb/vol10/p469-ceccarello.pdf, PVLDB Volume 10, No. 5, January 201

    A new polystyrene-based ionomer/MWCNT nanocomposite for wearable skin temperature sensors

    Get PDF
    The present work outlines the fabrication and testing of a novel skin temperature sensor based on exfoliated and undamaged multi-walled carbon nanotubes (MWCNTs) dispersed in a poly(vinylbenzyl chloride) derivative with triethylamine (PVBC_Et3N). The dispersions were prepared by sonicating MWCNT/ PVBC_Et3N mixtures in dimethylformamide for 5 min and the quantification of the MWCNTs dispersed was evaluated by UV–vis spectroscopy investigations and thermogravimetric analyses. The investigations demonstrated the realization of MWCNT/PVBC_Et3N sensors with a resistance sensitivity to temperature close to 0.004 K1, an absolute value that is comparable to the highest values found in metals. The temperature dependence of the resistance was also found very reproducible in the range 20–40 C, thus suggesting the possibility of using the MWCNT/PVBC_Et3N system for the fabrication of small wearable temperature sensors for the monitoring of chronic wounds

    Diabetes mellitus and ischemic heart disease. the role of ion channels

    Get PDF
    Diabetes mellitus is one the strongest risk factors for cardiovascular disease and, in particular, for ischemic heart disease (IHD). The pathophysiology of myocardial ischemia in diabetic patients is complex and not fully understood: some diabetic patients have mainly coronary stenosis obstructing blood flow to the myocardium; others present with coronary microvascular disease with an absence of plaques in the epicardial vessels. Ion channels acting in the cross-talk between the myocardial energy state and coronary blood flow may play a role in the pathophysiology of IHD in diabetic patients. In particular, some genetic variants for ATP-dependent potassium channels seem to be involved in the determinism of IH

    Accurate MapReduce Algorithms for k-Median and k-Means in General Metric Spaces

    Get PDF
    Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular k-median and k-means variants which, given a set P of points from a metric space and a parameter k<|P|, require to identify a set S of k centers minimizing, respectively, the sum of the distances and of the squared distances of all points in P from their closest centers. Our specific focus is on general metric spaces, for which it is reasonable to require that the centers belong to the input set (i.e., S subseteq P). We present coreset-based 3-round distributed approximation algorithms for the above problems using the MapReduce computational model. The algorithms are rather simple and obliviously adapt to the intrinsic complexity of the dataset, captured by the doubling dimension D of the metric space. Remarkably, the algorithms attain approximation ratios that can be made arbitrarily close to those achievable by the best known polynomial-time sequential approximations, and they are very space efficient for small D, requiring local memory sizes substantially sublinear in the input size. To the best of our knowledge, no previous distributed approaches were able to attain similar quality-performance guarantees in general metric spaces
    corecore