Overlap of Computation and Communication on Shared-Memory

T. S. Abdelrahman; G. Liu

Published: Mar 1, 2001

T. S. Abdelrahman

G. Liu

Abstract

This paper describes and evaluates a compiler transformation that improves the performance of parallel programs on Network-of-Workstation (NOW) shared-memory multiprocessors. The transformation overlaps the communication time resulting form non-local memory accesses with the computation time in parallel loops to effectively hide the latency of the remote accesses. The transformation peels from a parallel loop iterations that access remote data and re-schedules them after the execution of iterations that access only local data (local-only iterations). Asynchronous prefetching of remote data is used to overlap non-local access latency with the execution of local-only iterations. Experimental evaluation of the transformation on a NOW multiprocessor indicates that it is generally effective in improving parallel execution time (up to 1.9 times). The extent of the benefit is determined by three factors: The extent of the benefit is determined by three factors: the size of local-only computations, the significance of remote memory access latency, and the position of the iterations that access remote data in a parallel loop.

Issue

Vol. 2 No. 2 (1999)

Section

Proposal for Special Issue Papers

Article Sidebar

Main Article Content

Abstract

Article Details