Search CORE

171 research outputs found

Fast and Accurate Mining of Correlated Heavy Hitters

Author: Cafaro Massimo
Epicoco Italo
Pulimeno Marco
Publication venue
Publication date: 06/04/2017
Field of study

The problem of mining Correlated Heavy Hitters (CHH) from a two-dimensional data stream has been introduced recently, and a deterministic algorithm based on the use of the Misra--Gries algorithm has been proposed by Lahiri et al. to solve it. In this paper we present a new counter-based algorithm for tracking CHHs, formally prove its error bounds and correctness and show, through extensive experimental results, that our algorithm outperforms the Misra--Gries based algorithm with regard to accuracy and speed whilst requiring asymptotically much less space

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca- Università del Salento

The Digital Puglia Project: An Active Digital Library of Remote Sensing Data

Author: Aloisio Giovanni
Cafaro Massimo
Williams Roy
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1999
Field of study

The growing need of software infrastructure able to create, maintain and ease the evolution of scientific data, promotes the development of digital libraries in order to provide the user with fast and reliable access to data. In a world that is rapidly changing, the standard view of a digital library as a data repository specialized to a community of users and provided with some search tools is no longer tenable. To be effective, a digital library should be an active digital library, meaning that users can process available data not just to retrieve a particular piece of information, but to infer new knowledge about the data at hand. Digital Puglia is a new project, conceived to emphasize not only retrieval of data to the client's workstation, but also customized processing of the data. Such processing tasks may include data mining, filtering and knowledge discovery in huge databases, compute-intensive image processing (such as principal component analysis, supervised classification, or pattern matching) and on demand computing sessions. We describe the issues, the requirements and the underlying technologies of the Digital Puglia Project, whose final goal is to build a high performance distributed and active digital library of remote sensing data

CiteSeerX

Caltech Authors

Archivio Istituzionale della Ricerca- Università del Salento

A Parallel Space Saving Algorithm For Frequent Items and the Hurwitz zeta distribution

Author: Cafaro Massimo
Pulimeno Marco
Tempesta Piergiulio
Publication venue: 'Elsevier BV'
Publication date: 11/09/2015
Field of study

We present a message-passing based parallel version of the Space Saving algorithm designed to solve the

k

--majority problem. The algorithm determines in parallel frequent items, i.e., those whose frequency is greater than a given threshold, and is therefore useful for iceberg queries and many other different contexts. We apply our algorithm to the detection of frequent items in both real and synthetic datasets whose probability distribution functions are a Hurwitz and a Zipf distribution respectively. Also, we compare its parallel performances and accuracy against a parallel algorithm recently proposed for merging summaries derived by the Space Saving or Frequent algorithms.Comment: Accepted for publication. To appear in Information Sciences, Elsevier. http://www.sciencedirect.com/science/article/pii/S002002551500657

arXiv.org e-Print Archive

Docta Complutense

Crossref

Archivio Istituzionale della Ricerca- Università del Salento

Parallel and Distributed Frugal Tracking of a Quantile

Author: Cafaro Massimo
Epicoco Italo
Pulimeno Marco
Publication venue
Publication date: 01/01/2024
Field of study

In this paper, we deal with the problem of monitoring network latency. Indeed, latency is a key network metric related to both network performance and quality of service, since it directly impacts on the overall user’s experience. High latency leads to unacceptably slow response times of network services, and may increase network congestion and reduce the throughput, in turn disrupting communications and the user’s experience. A common approach to monitoring network latency takes into account the frequently skewed distribution of latency values, and therefore specific quantiles are monitored, such as the 95th, 98th, and 99th percentiles. We present a comparative analysis of the speed of convergence of the sequential FRUGAL-1U, FRUGAL-2U, and EASYQUANTILE algorithms and the design and analysis of parallel, message-passing-based versions of these algorithms that can be used for monitoring network latency quickly and accurately. Distributed versions are also discussed. Extensive experimental results are provided and discussed as well

Multidisciplinary Digital Publishing Institute

Archivio Istituzionale della Ricerca- Università del Salento

Distributed mining of time-faded heavy hitters

Author: Italo Epicoco
Marco Pulimeno
Massimo Cafaro
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Archivio Istituzionale della Ricerca- Università del Salento

Parallel mining of time-faded heavy hitters

Author: Cafaro Massimo
Epicoco Italo
Pulimeno Marco
Publication venue: 'Elsevier BV'
Publication date: 11/01/2017
Field of study

In this paper we present PFDCMSS (Parallel Forward Decay Count-Min Space Saving) which, to the best of our knowledge, is the world first message-passing parallel algorithm for mining time-faded heavy hitters. The algorithm is a parallel version of the recently published FDCMSS (Forward Decay Count-Min Space Saving) sequential algorithm. We formally prove its correctness by showing that the underlying data structure, a sketch augmented with a Space Saving stream summary holding exactly two counters, is mergeable. Whilst mergeability of traditional sketches derives immediately from theory, we show that, instead, merging our augmented sketch is non trivial. Nonetheless, the resulting parallel algorithm is fast and simple to implement. The very large volumes of modern datasets in the context of Big Data present new challenges that current sequential algorithms can not cope with; on the contrary, parallel computing enables near real time processing of very large datasets, which are growing at an unprecedented scale. Our algorithm's implementation, taking advantage of the MPI (Message Passing Interface) library, is portable, reliable and provides cutting-edge performance. Extensive experimental results confirm that PFDCMSS retains the extreme accuracy and error bound provided by FDCMSS whilst providing excellent parallel scalability. Our contributions are three-fold: (i) we prove the non trivial mergeability of the augmented sketch used in the FDCMSS algorithm; (ii) we derive PFDCMSS, a novel message-passing parallel algorithm; (iii) we experimentally prove that PFDCMSS is extremely accurate and scalable, allowing near real time processing of large datasets. The result supports both casual users and seasoned, professional scientists working on expert and intelligent systems

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca- Università del Salento

The Desktop Grid Environment Enabler

Author: Aloisio Giovanni
Cafaro Massimo
Lezzi Daniele
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 21/02/2012
Field of study

This paper describes our Desktop Grid Environment Enabler (DEGREE), a set of Web Services that provides advanced capabilities for grid computing. DEGREE services are based both on the Globus Toolkit and the Grid Resource Broker, a grid portal developed at the University of Lecce. Trusted users can develop innovative, grid-aware applications that seamlessly access computational resources and services exploiting our Web Services independently of platform and programming language

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)