Distribution & Parallelism

Thu 10:30-12:00 pm - Regency CD
Chair: Andrew Black
Multiverse: Efficiently supporting distributed high-level speculation
Kaushik Ravichandran, Georgia Tech, USA
Santosh Pande, Georgia Tech, USA

Algorithmic speculation or high-level speculation is a promising programming paradigm which allows programmers to speculatively branch an execution into multiple independent parallel sections and then choose the best (perhaps fastest) amongst them. The continuing execution after the speculatively branched section sees only the modifications made by the best one. This programming paradigm allows programmers to harness parallelism and can provide dramatic performance improvements.

In this paper we present the Multiverse speculative programming model. Multiverse allows programmers to exploit parallelism through high-level speculation. It can effectively harness large amounts of parallelism by speculating across an entire cluster and is not bound by the parallelism available in a single machine. We present abstractions and a runtime which allow programmers to introduce large scale high-level speculative parallelism into applications with minimal effort. We introduce a novel on-demand address space sharing mechanism which provide speculations efficient transparent access to the original address space of the application (including the use of pointers) across machine boundaries. Multiverse provides single commit semantics across speculations while guaranteeing isolation between them. We also introduce novel mechanisms to deal with scalability bottlenecks when there are a large number of speculations.

We demonstrate that for several benchmarks, Multiverse achieves impressive speedups and good scalability across an entire cluster. We study the overheads of the runtime and demonstrate how our special scalability mechanisms are crucial in scaling cluster wide.

Fully Concurrent Garbage Collection of Actors in Many-Core Machines
Sylvan Clebsch, Imperial College London, UK
Sophia Drossopoulou, Imperial College London, UK

Disposal of dead actors in actor-model languages is as important as disposal of unreachable objects in object-oriented languages. In current practice, programmers are required to either manually terminate actors, or they have to rely on garbage collection systems that monitor actor mutation through write barriers, thread coordination through locks etc. These techniques, however, prevent the collector from being fully concurrent.

We developed a protocol that allows garbage collection to run fully concurrently with all actors. The main challenges in concurrent garbage collection is the detection of cycles of sleeping actors in the actors graph, in the presence of concurrent mutation of this graph. Our protocol is solely built on message passing: it uses deferred direct reference counting, a dedicated actor for the detection of (cyclic) garbage, and a confirmation protocol (to deal with the mutation of the actor graph).

We present our ideas informally through an example, and then present a formal model, prove soundness and argue completeness. We have implemented the protocol as part of a runtime library. We present two benchmarks which indicate that the overhead is small.

Isolation for Nested Task-Parallelism
Jisheng Zhao, Rice U, USA
Roberto Lublinerman, Google, USA
Zoran Budimlic, Rice U, USA
Swarat Chaudhuri, Rice U, USA
Vivek Sarkar, Rice U, USA

Isolation---the property that a task can access shared data without interference from other tasks---is one of the most basic concerns in parallel programming. While there is a large literature on isolated task-parallelism, the integration of isolation, task-parallelism, and nesting of tasks is a difficult and unresolved challenge. In this paper, we present a programming and execution model called Otello where isolation is extended to arbitrarily nested parallel tasks that make irregular heap accesses. At the same time, no additional burden is imposed on the programmer, who only exposes parallelism by creating and synchronizing parallel tasks, leaving the underlying compiler and runtime system to ensure isolation.

Otello extends Lublinerman et al's delegated isolation mechanism to the setting of nested parallelism. The basic runtime construct here is an assembly: a task equipped with a region in the shared heap that it owns. When an assembly A conflicts with an assembly B, A transfers---or delegates---its code and owned region to a specifically selected assembly C in a way that will ensure isolation with B, leaving the responsibility of re-executing task A to C. The choice of C depends on the nesting relationship between A and B.

We have implemented Otello on top of the Habanero Java (HJ) parallel programming language, and used this implementation to evaluate Otello on collections of nested-parallel benchmarks and transactional benchmarks (without nested tasks). On the nested-parallel benchmarks, Otello achieves scalability comparable to HJ programs without built-in isolation, and the relative overhead of Otello is lower than that of many published data-race detection algorithms. For the transactional benchmarks, Otello incurs lower overhead than a state-of-the-art software transactional memory system (Deuce).

Turning Nondeterminism into Parallelism
Omer Tripp, Tel Aviv U, Israel
Eric Koskinen, New York U, USA
Mooly Sagiv, Tel Aviv U, Israel

Nondeterminism is a useful and prevalent concept in the design and implementation of software systems. An important property of nondeterminism is its latent parallelism: A nondeterministic action can evaluate to multiple behaviors. If at least one of these behaviors does not conflict with concurrent tasks, then there is an admissible execution of the action in parallel with these tasks. Unfortunately, existing implementations of the atomic paradigm --- optimistic as well as pessimistic --- are unable to fully exhaust the parallelism potential of nondeterministic actions, lacking the means to guide concurrent tasks toward nondeterministic choices that minimize interference.

This paper investigates the problem of utilizing parallelism due to nondeterminism. We observe that nondeterminism occurs in many real-world codes. We motivate the need for devising coordination mechanisms that can utilize available nondeterminism. We have developed a system featuring such mechanisms, which leverages nondeterminism in a wide class of query operations, allowing a task to look into the future of concurrent tasks that mutate the shared state during query evaluation and reduce conflict accordingly. We evaluate our system on a suite of 12 algorithmic benchmarks of wide applicability, as well as an industrial application. The results are encouraging.


Antony Hosking
Patrick Eugster
Purdue U


Cristina V. Lopes
UC Irvine


Robert Hirschfeld


Carl Friedrich Bolz

Sponsored by ACM SIGPLAN


Plaxo Facebook Twitter