33 research outputs found
On the Importance of Registers for Computability
All consensus hierarchies in the literature assume that we have, in addition
to copies of a given object, an unbounded number of registers. But why do we
really need these registers?
This paper considers what would happen if one attempts to solve consensus
using various objects but without any registers. We show that under a
reasonable assumption, objects like queues and stacks cannot emulate the
missing registers. We also show that, perhaps surprisingly, initialization,
shown to have no computational consequences when registers are readily
available, is crucial in determining the synchronization power of objects when
no registers are allowed. Finally, we show that without registers, the number
of available objects affects the level of consensus that can be solved.
Our work thus raises the question of whether consensus hierarchies which
assume an unbounded number of registers truly capture synchronization power,
and begins a line of research aimed at better understanding the interaction
between read-write memory and the powerful synchronization operations available
on modern architectures.Comment: 12 pages, 0 figure
How Many Cooks Spoil the Soup?
In this work, we study the following basic question: "How much parallelism
does a distributed task permit?" Our definition of parallelism (or symmetry)
here is not in terms of speed, but in terms of identical roles that processes
have at the same time in the execution. We initiate this study in population
protocols, a very simple model that not only allows for a straightforward
definition of what a role is, but also encloses the challenge of isolating the
properties that are due to the protocol from those that are due to the
adversary scheduler, who controls the interactions between the processes. We
(i) give a partial characterization of the set of predicates on input
assignments that can be stably computed with maximum symmetry, i.e.,
, where is the minimum multiplicity of a state in
the initial configuration, and (ii) we turn our attention to the remaining
predicates and prove a strong impossibility result for the parity predicate:
the inherent symmetry of any protocol that stably computes it is upper bounded
by a constant that depends on the size of the protocol.Comment: 19 page
Of Choices, Failures and Asynchrony: The Many Faces of Set Agreement
International audienceSet agreement is a fundamental problem in distributed com- puting in which processes collectively choose a small subset of values from a larger set of proposals. The impossibility of fault-tolerant set agreement in asynchronous networks is one of the seminal results in distributed computing. The complexity of set agreement in synchronous networks has also been a significant research challenge. Real systems, however, are neither purely synchronous nor purely asynchronous. Rather, they tend to alternate between periods of synchrony and periods of asynchrony. In this paper, we analyze the complexity of set agreement in a such a "partially synchronous" setting, presenting the first (asymptotically) tight bound on the complexity of set agreement in such systems. We introduce a novel technique for simulating, in fault-prone asynchronous shared memory, executions of an asynchronous and failure-prone message- passing system in which some fragments appear synchronous to some processes. We use this technique to derive a lower bound on the round complexity of set agreement in a partially synchronous system by a reduction from asynchronous wait-free set agreement. We also present an asymptotically matching algorithm that relies on a distributed asyn- chrony detection mechanism to decide as soon as possible during periods of synchrony. By relating environments with differing degrees of synchrony, our simu- lation technique is of independent interest. In particular, it allows us to obtain a new lower bound on the complexity of early deciding k-set agree- ment complementary to that of [12], and to re-derive the combinatorial topology lower bound of [13] in an algorithmic way
Recommended from our members
Wait-free Trees with Asymptotically-Efficient Range Queries
Tree data structures, such as red-black trees, quad trees, treaps, or tries, are fundamental tools in computer science. A classical problem in concurrency is to obtain expressive, efficient, and scalable versions of practical tree data structures. We are interested in concurrent trees supporting range queries, i.e., queries that involve multiple consecutive data items. Existing implementations with this capability can list keys in a specific range, but do not support aggregate range queries: for instance, if we want to calculate the number of keys in a range, the only choice is to retrieve a whole list and return its size. This is suboptimal: in the sequential setting, one can augment a balanced search tree with counters and, consequently, perform these aggregate requests in logarithmic rather than linear time.In this paper, we propose a generic approach to implement a broad class of range queries on concurrent trees in a way that is wait-free, asymptotically efficient, and practically scalable. The key idea is a new mechanism for maintaining metadata concurrently at tree nodes, which can be seen as a wait-free variant of hand-over-hand locking (which we call hand-over-hand helping). We did a preliminary implementation of the wait-free binary search tree and preliminary experiments have indicated the soundness of our approach
Fast Approximate Counting and Leader Election in Populations
We study the problems of leader election and population size counting for population protocols: networks of finite-state anonymous agents that interact randomly under a uniform random scheduler. We show a protocol for leader election that terminates in parallel time, where is a parameter, using states. By adjusting the parameter between a constant and , we obtain a single leader election protocol whose time and space can be smoothly traded off between to time and to states. Finally, we give a protocol which provides an upper bound of the size of the population, where is at most for some . This protocol assumes the existence of a unique leader in the population and stabilizes in parallel time, using constant number of states in every node, except the unique leader which is required to use states
Lock-Free algorithms under stochastic schedulers
In this work, we consider the following random process, motivated by the analysis of lock-free concurrent algorithms under high memory contention. In each round, a new scheduling step is allocated to one of n threads, according to a distribution p = (p1, p2, ..., pn), where thread i is scheduled with probability pi. When some thread first reaches a set threshold of executed steps, it registers a win, completing its current operation, and resets its step count to 1. At the same time, threads whose step count was close to the threshold also get reset because of the win, but to 0 steps, being penalized for almost winning. We are interested in two questions: how often does some thread complete an operation (system latency), and how often does a specific thread complete an operation (individual latency)? We provide asymptotically tight bounds for the system and individual latency of this general concurrency pattern, for arbitrary scheduling distributions p. Surprisingly, a simple characterization exists: in expectation, the system will complete a new operation every Θ(1 / |p|2) steps, while thread i will complete a new operation every Θ(|p|2 / pi2) steps. The proof is interesting in its own right, as it requires a careful analysis of how the higher norms of the vector p influence the thread step counts and latencies in this random process. Our result offers a simple connection between the scheduling distribution and the average performance of concurrent algorithms, which has several applications
Quantized stochastic gradient descent: communication versus convergence
Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks. A fundamental barrier for parallelizing large-scale SGD is the fact that the cost of communicating the gradient updates between nodes can be very large. Consequently, lossy compression heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always provably converge, and it is not clear whether they are optimal. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes which allow the compression of gradient updates at each node, while guaranteeing convergence under standard assumptions. QSGD allows the user to trade off compression and convergence time: it can communicate a sublinear number of bits per iteration in the model dimension, and can achieve asymptotically optimal communication cost. We complement our theoretical results with empirical data, showing that QSGD can significantly reduce communication cost, while being competitive with standard uncompressed techniques on a variety of real tasks
