Though more difficult to program, distributed-memory parallel machines provide greater scalability than their shared-memory counterparts. Software Distributed Shared Memory (SDSM) systems provide the abstraction of shared memory on a distributed machine. While SDSMs provide an attractive programming model, they currently cannot efficiently support all classes of scientific application. One such class are those with recurrences that cause dependencies across processors or nodes. A popular solution to such problems is to use pipelining
which breaks the computation into blocks; each processor performs the computation of a block, which enables the next processor in the pipeline to compute its corresponding block. Once the pipeline is filled, the computation of blocks proceeds in parallel. While pipelining is useful, it is not efficiently supported by current SDSM systems.
This paper presents an approach to integrating pipelining into SDSM systems. We describe our design and implementation of one-way pipelining in a SDSM. The key idea is to retain the shared-memory model, but design the extensions such that the execution will mimic what would be done in an explicit message-passing program. We show that one-way pipelining is superior to the two most common ways to program pipelined applications, which are distributed locks and explicit matrix transposition. Finally, we show that one-way pipelining is competitive with a hand-coded, explicit message-passing program.