194 research outputs found
A Bulk-Parallel Priority Queue in External Memory with STXXL
We propose the design and an implementation of a bulk-parallel external
memory priority queue to take advantage of both shared-memory parallelism and
high external memory transfer speeds to parallel disks. To achieve higher
performance by decoupling item insertions and extractions, we offer two
parallelization interfaces: one using "bulk" sequences, the other by defining
"limit" items. In the design, we discuss how to parallelize insertions using
multiple heaps, and how to calculate a dynamic prediction sequence to prefetch
blocks and apply parallel multiway merge for extraction. Our experimental
results show that in the selected benchmarks the priority queue reaches 75% of
the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the
speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape
Quality Assessment of Linked Datasets using Probabilistic Approximation
With the increasing application of Linked Open Data, assessing the quality of
datasets by computing quality metrics becomes an issue of crucial importance.
For large and evolving datasets, an exact, deterministic computation of the
quality metrics is too time consuming or expensive. We employ probabilistic
techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient
estimation for implementing a broad set of data quality metrics in an
approximate but sufficiently accurate way. Our implementation is integrated in
the comprehensive data quality assessment framework Luzzu. We evaluated its
performance and accuracy on Linked Open Datasets of broad relevance.Comment: 15 pages, 2 figures, To appear in ESWC 2015 proceeding
Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array
The longest common prefix (LCP) array is a versatile auxiliary data structure
in indexed string matching. It can be used to speed up searching using the
suffix array (SA) and provides an implicit representation of the topology of an
underlying suffix tree. The LCP array of a string of length can be
represented as an array of length words, or, in the presence of the SA, as
a bit vector of bits plus asymptotically negligible support data
structures. External memory construction algorithms for the LCP array have been
proposed, but those proposed so far have a space requirement of words
(i.e. bits) in external memory. This space requirement is in some
practical cases prohibitively expensive. We present an external memory
algorithm for constructing the bit version of the LCP array which uses
bits of additional space in external memory when given a
(compressed) BWT with alphabet size and a sampled inverse suffix array
at sampling rate . This is often a significant space gain in
practice where is usually much smaller than or even constant. We
also consider the case of computing succinct LCP arrays for circular strings
Optimal Prediction for Prefetching in the Worst Case
This is the published version. Copyright © 1998 Society for Industrial and Applied MathematicsResponse time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, where the predictions must be accurate as well as made in a computationally efficient way. Unlike other online problems, prefetching cannot admit a competitive analysis, since the optimal offline prefetcher incurs no cost when it knows the future page requests. Previous analytical work on prefetching [. Vitter Krishnan 1991.] [J. Assoc. Comput. Mach., 143 (1996), pp. 771--793] consisted of modeling the user as a probabilistic Markov source.
In this paper, we look at the much stronger form of worst-case analysis and derive a randomized algorithm for pure prefetching. We compare our algorithm for every page request sequence with the important class of finite state prefetchers, making no assumptions as to how the sequence of page requests is generated. We prove analytically that the fault rate of our online prefetching algorithm converges almost surely for every page request sequence to the fault rate of the optimal finite state prefetcher for the sequence. This analysis model can be looked upon as a generalization of the competitive framework, in that it compares an online algorithm in a worst-case manner over all sequences with a powerful yet nonclairvoyant opponent. We simultaneously achieve the computational goal of implementing our prefetcher in optimal constant expected time per prefetched page using the optimal dynamic discrete random variate generator of [. Matias Matias, Vitter, and Ni [Proc. 4th Annual SIAM/ACM Symposium on Discrete Algorithms, Austin, TX, January 1993]
Optimal Prediction for Prefetching in the Worst Case
AMS subject classi cations. 68Q25, 68T05, 68P20, 68N25, 60J20
PII. S0097539794261817Response time delays caused by I/O are a major problem in many systems and
database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, where the predictions must be accurate as well as made in a computationally
e cient way. Unlike other online problems, prefetching cannot admit a competitive analysis, since the
optimal o ine prefetcher incurs no cost when it knows the future page requests. Previous analytical
work on prefetching [J. Assoc. Comput. Mach., 143 (1996), pp. 771{793] consisted of modeling the
user as a probabilistic Markov source.
In this paper, we look at the much stronger form of worst-case analysis and derive a randomized
algorithm for pure prefetching. We compare our algorithm for every page request sequence with the
important class of nite state prefetchers, making no assumptions as to how the sequence of page
requests is generated. We prove analytically that the fault rate of our online prefetching algorithm
converges almost surely for every page request sequence to the fault rate of the optimal nite state
prefetcher for the sequence. This analysis model can be looked upon as a generalization of the com-
petitive framework, in that it compares an online algorithm in a worst-case manner over all sequences
with a powerful yet nonclairvoyant opponent. We simultaneously achieve the computational goal of
implementing our prefetcher in optimal constant expected time per prefetched page using the optimal
dynamic discrete random variate generator of Matias, Vitter, and Ni [Proc. 4th Annual SIAM/ACM
Symposium on Discrete Algorithms, Austin, TX, January 1993]
Adaptive Disk Spindown via Optimal Rent-to-Buy in Probabilistic Environments
The original publication is available at www.springerlink.comIn the single rent-to-buy decision problem, without a priori knowledge of the amount of time
a resource will be used we need to decide when to buy the resource, given that we can rent the
resource for c. In this paper we study algorithms
that make a sequence of single rent-to-buy decisions, using the assumption that the resource use
times are independently drawn from an unknown probability distribution. Our study of this rent-
to-buy problem is motivated by important systems applications, speci cally, problems arising
from deciding when to spindown disks to conserve energy in mobile computers [DKM, LKH,
MDK], thread blocking decisions during lock acquisition in multiprocessor applications [KLM],
and virtual circuit holding times in IP-over-ATM networks [KLP, SaK].
We develop a provably optimal and computationally e cient algorithm for the rent-to-buy
problem. Our algorithm uses O(pt) time and space, and its expected cost for the tth resource use
converges to optimal as O(plog t=t), for any bounded probability distribution on the resource
use times. Alternatively, using O(1) time and space, the algorithm almost converges to optimal.
We describe the experimental results for the application of our algorithm to one of the
motivating systems problems: the question of when to spindown a disk to save power in a mobile
computer. Simulations using disk access traces obtained from an HP workstation environment
suggest that our algorithm yields signi cantly improved power/response time performance over
the non-adaptive 2-competitive algorithm which is optimal in the worst-case competitive analysis
model
Lavoisier: A Low Altitude Balloon Network for Probing the Deep Atmosphere and Surface of Venus
The in-situ exploration of the low atmosphere and surface of Venus is clearly the next step of Venus exploration. Understanding the geochemistry of the low atmosphere, interacting with rocks, and the way the integrated Venus system evolved, under the combined effects of inner planet cooling and intense atmospheric greenhouse, is a major challenge of modern planetology. Due to the dense atmosphere (95 bars at the surface), balloon platforms offer an interesting means to transport and land in-situ measurement instruments. Due to the large Archimede force, a 2 cubic meter He-pressurized balloon floating at 10 km altitude may carry up to 60 kg of payload. LAVOISIER is a project submitted to ESA in 2000, in the follow up and spirit of the balloon deployed at cloud level by the Russian Vega mission in 1986. It is composed of a descent probe, for detailed noble gas and atmosphere composition analysis, and of a network of 3 balloons for geochemical and geophysical investigations at local, regional and global scales
Online Perfect Matching and Mobile Computing
The original publication is available at www.springerlink.comWe present a natural online perfect matching problem moti-
vated by problems in mobile computing. A total of n customers connect
and disconnect sequentially, and each customer has an associated set of
stations to which it may connect. Each station has a capacity limit. We
allow the network to preemptively switch a customer between allowed
stations to make room for a new arrival. We wish to minimize the total
number of switches required to provide service to every customer. Equiv-
alently, we wish to maintain a perfect matching between customers and
stations and minimize the lengths of the augmenting paths. We measure
performance by the worst case ratio of the number of switches made to
the minimum number required.
When each customer can be connected to at most two stations:
{ Some intuitive algorithms have lower bounds of
(n) and
(n= log n).
{ When the station capacities are 1, there is an upper bound of O(pn).
{ When customers do not disconnect and the station capacity is 1, we
achieve a competitive ratio of O(log n).
{ There is a lower bound of
(pn) when the station capacities are 2.
{ We present optimal algorithms when the station capacity is arbitrary
in special cases
- …
