194 research outputs found

    A Bulk-Parallel Priority Queue in External Memory with STXXL

    Get PDF
    We propose the design and an implementation of a bulk-parallel external memory priority queue to take advantage of both shared-memory parallelism and high external memory transfer speeds to parallel disks. To achieve higher performance by decoupling item insertions and extractions, we offer two parallelization interfaces: one using "bulk" sequences, the other by defining "limit" items. In the design, we discuss how to parallelize insertions using multiple heaps, and how to calculate a dynamic prediction sequence to prefetch blocks and apply parallel multiway merge for extraction. Our experimental results show that in the selected benchmarks the priority queue reaches 75% of the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape

    Quality Assessment of Linked Datasets using Probabilistic Approximation

    Full text link
    With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu. We evaluated its performance and accuracy on Linked Open Datasets of broad relevance.Comment: 15 pages, 2 figures, To appear in ESWC 2015 proceeding

    Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

    Full text link
    The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length nn can be represented as an array of length nn words, or, in the presence of the SA, as a bit vector of 2n2n bits plus asymptotically negligible support data structures. External memory construction algorithms for the LCP array have been proposed, but those proposed so far have a space requirement of O(n)O(n) words (i.e. O(nlogn)O(n \log n) bits) in external memory. This space requirement is in some practical cases prohibitively expensive. We present an external memory algorithm for constructing the 2n2n bit version of the LCP array which uses O(nlogσ)O(n \log \sigma) bits of additional space in external memory when given a (compressed) BWT with alphabet size σ\sigma and a sampled inverse suffix array at sampling rate O(logn)O(\log n). This is often a significant space gain in practice where σ\sigma is usually much smaller than nn or even constant. We also consider the case of computing succinct LCP arrays for circular strings

    Optimal Prediction for Prefetching in the Worst Case

    Get PDF
    This is the published version. Copyright © 1998 Society for Industrial and Applied MathematicsResponse time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, where the predictions must be accurate as well as made in a computationally efficient way. Unlike other online problems, prefetching cannot admit a competitive analysis, since the optimal offline prefetcher incurs no cost when it knows the future page requests. Previous analytical work on prefetching [. Vitter Krishnan 1991.] [J. Assoc. Comput. Mach., 143 (1996), pp. 771--793] consisted of modeling the user as a probabilistic Markov source. In this paper, we look at the much stronger form of worst-case analysis and derive a randomized algorithm for pure prefetching. We compare our algorithm for every page request sequence with the important class of finite state prefetchers, making no assumptions as to how the sequence of page requests is generated. We prove analytically that the fault rate of our online prefetching algorithm converges almost surely for every page request sequence to the fault rate of the optimal finite state prefetcher for the sequence. This analysis model can be looked upon as a generalization of the competitive framework, in that it compares an online algorithm in a worst-case manner over all sequences with a powerful yet nonclairvoyant opponent. We simultaneously achieve the computational goal of implementing our prefetcher in optimal constant expected time per prefetched page using the optimal dynamic discrete random variate generator of [. Matias Matias, Vitter, and Ni [Proc. 4th Annual SIAM/ACM Symposium on Discrete Algorithms, Austin, TX, January 1993]

    Optimal Prediction for Prefetching in the Worst Case

    Get PDF
    AMS subject classi cations. 68Q25, 68T05, 68P20, 68N25, 60J20 PII. S0097539794261817Response time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, where the predictions must be accurate as well as made in a computationally e cient way. Unlike other online problems, prefetching cannot admit a competitive analysis, since the optimal o ine prefetcher incurs no cost when it knows the future page requests. Previous analytical work on prefetching [J. Assoc. Comput. Mach., 143 (1996), pp. 771{793] consisted of modeling the user as a probabilistic Markov source. In this paper, we look at the much stronger form of worst-case analysis and derive a randomized algorithm for pure prefetching. We compare our algorithm for every page request sequence with the important class of nite state prefetchers, making no assumptions as to how the sequence of page requests is generated. We prove analytically that the fault rate of our online prefetching algorithm converges almost surely for every page request sequence to the fault rate of the optimal nite state prefetcher for the sequence. This analysis model can be looked upon as a generalization of the com- petitive framework, in that it compares an online algorithm in a worst-case manner over all sequences with a powerful yet nonclairvoyant opponent. We simultaneously achieve the computational goal of implementing our prefetcher in optimal constant expected time per prefetched page using the optimal dynamic discrete random variate generator of Matias, Vitter, and Ni [Proc. 4th Annual SIAM/ACM Symposium on Discrete Algorithms, Austin, TX, January 1993]

    Adaptive Disk Spindown via Optimal Rent-to-Buy in Probabilistic Environments

    Get PDF
    The original publication is available at www.springerlink.comIn the single rent-to-buy decision problem, without a priori knowledge of the amount of time a resource will be used we need to decide when to buy the resource, given that we can rent the resource for 1perunittimeorbuyitonceandforallfor1 per unit time or buy it once and for all for c. In this paper we study algorithms that make a sequence of single rent-to-buy decisions, using the assumption that the resource use times are independently drawn from an unknown probability distribution. Our study of this rent- to-buy problem is motivated by important systems applications, speci cally, problems arising from deciding when to spindown disks to conserve energy in mobile computers [DKM, LKH, MDK], thread blocking decisions during lock acquisition in multiprocessor applications [KLM], and virtual circuit holding times in IP-over-ATM networks [KLP, SaK]. We develop a provably optimal and computationally e cient algorithm for the rent-to-buy problem. Our algorithm uses O(pt) time and space, and its expected cost for the tth resource use converges to optimal as O(plog t=t), for any bounded probability distribution on the resource use times. Alternatively, using O(1) time and space, the algorithm almost converges to optimal. We describe the experimental results for the application of our algorithm to one of the motivating systems problems: the question of when to spindown a disk to save power in a mobile computer. Simulations using disk access traces obtained from an HP workstation environment suggest that our algorithm yields signi cantly improved power/response time performance over the non-adaptive 2-competitive algorithm which is optimal in the worst-case competitive analysis model

    Lavoisier: A Low Altitude Balloon Network for Probing the Deep Atmosphere and Surface of Venus

    Get PDF
    The in-situ exploration of the low atmosphere and surface of Venus is clearly the next step of Venus exploration. Understanding the geochemistry of the low atmosphere, interacting with rocks, and the way the integrated Venus system evolved, under the combined effects of inner planet cooling and intense atmospheric greenhouse, is a major challenge of modern planetology. Due to the dense atmosphere (95 bars at the surface), balloon platforms offer an interesting means to transport and land in-situ measurement instruments. Due to the large Archimede force, a 2 cubic meter He-pressurized balloon floating at 10 km altitude may carry up to 60 kg of payload. LAVOISIER is a project submitted to ESA in 2000, in the follow up and spirit of the balloon deployed at cloud level by the Russian Vega mission in 1986. It is composed of a descent probe, for detailed noble gas and atmosphere composition analysis, and of a network of 3 balloons for geochemical and geophysical investigations at local, regional and global scales

    Online Perfect Matching and Mobile Computing

    Get PDF
    The original publication is available at www.springerlink.comWe present a natural online perfect matching problem moti- vated by problems in mobile computing. A total of n customers connect and disconnect sequentially, and each customer has an associated set of stations to which it may connect. Each station has a capacity limit. We allow the network to preemptively switch a customer between allowed stations to make room for a new arrival. We wish to minimize the total number of switches required to provide service to every customer. Equiv- alently, we wish to maintain a perfect matching between customers and stations and minimize the lengths of the augmenting paths. We measure performance by the worst case ratio of the number of switches made to the minimum number required. When each customer can be connected to at most two stations: { Some intuitive algorithms have lower bounds of (n) and (n= log n). { When the station capacities are 1, there is an upper bound of O(pn). { When customers do not disconnect and the station capacity is 1, we achieve a competitive ratio of O(log n). { There is a lower bound of (pn) when the station capacities are 2. { We present optimal algorithms when the station capacity is arbitrary in special cases
    corecore