Speculative parallelization of sequential algorithms

The problem

Nowadays computer systems with several processors in the same motherboard are becoming more and more common. Ideally, one would expect that all the applications benefit from these advances in computer architecture. Unfortunately this is not the case for most of the existing sequential applications, unless parallel versions are developed. Most efforts are devoted now to parallelize CPU-intensive, legacy code, particularly in the fields of simulation and bioinformatics. But to implement a parallel version of an application is a difficult task that requires an in-depth knowledge of the underlying hardware architecture, compiler optimizations and expert knowledge of the problem itself.

How can we extract automatically some of the inherent parallelism of an application? Through the use of parallelizing compilers. Most of them are able to extract loop-level parallelism, executing different iterations in parallel using different processors. But to do so, the compiler should be sure that all iterations can be executed independently, without affecting the sequential semantics of the application. Unfortunately, in many occasions the compiler can not tell, at compile time, whether the loop has dependencies among iterations or not, i.e., when the task carried out in each iteration depends on input data.

A solution: Speculative Parallelization

Speculative parallelization (SP) is a technique developed for shared-memory systems that allows to execute in parallel loops that can not be classified as "parallel" at compile time. The general idea is to optimistically execute the loop in parallel, hoping not to violate sequential semantics. At the same time, hardware or software mechanisms track the activity of each iteration executed in parallel, to detect any dependencies among iterations that may arise. If such a dependence is found, the involved iterations are stopped and re-executed in order, thus allowing the program to return to a correct result.

Of course, the speedups obtained depend on the inherent parallelism of the loop: if the loop is indeed sequential, there is no way to execute it in parallel. But the big advantage of this technique is that it allows to extract paralellism of many loops automatically.

The ATLaS framework

We have developed ATLaS, a state-of-the-art, software-only framework that allows the speculative parallelization of loops. This mechanism, that extends OpenMP and it is extremely easy to use, is able to obtain speedups even in the presence of dependencies among iterations. The details of this framework is described in the following publication:

An OpenMP Extension that Supports Thread-Level Speculation. Sergio Aldea, Alvaro Estebanez, Diego R. Llanos, Arturo González Escribano.IEEE Transactions on Parallel and Distributed Systems. ISSN 1045-9219, IEEE Press, 2015. [Early access version]

This framework can be downloaded from The ATLaS project webpage.

Additional information

For more detailed information, please consult the list of publications. If you feel you have an application where our technique may be useful, we will be happy to study the problem, so please contact us by e-mail.

This research is being partially sponsored by the following projects:

	TRACS (Training and Support for Advanced Computing Systems) project , European Union (code HPRI-CT-1999-00026).
	HPC-Europa Project (High Performance Computing - Europa), European Union (code RII3-CT-2003-506079).