PDF, 287 KB
Zipped PostScript, 344 KB
Data management is a key aspect of any distributed system. This paper surveys data management techniques in various distributed systems, starting from Distributed Shared Memory (DSM) systems to Peer-to-Peer (P2P) systems. The central focus is on scalability, an important non-functional property of distributed systems. A scalability taxonomy of data management techniques is presented. Detailed discussion of the evolution of data management techniques in the different categories as well as the state of the art is provided. As a result, several open issues are inferred including use of P2P techniques in data grids and distributed mobile systems and the use of optimal data placement heuristics from Content Distribution Networks (CDNs) for P2P grids.
The efficient use of distributed memory parallel systems requires the load on each processor to be well balanced. In cases where the load changes unpredictably during the computation, a dynamic load balancing strategy is needed. Load balancing problems have been studied extensively in recent years, particularly in the context of unstructured mesh based applications. Static load balancing can be approximated by a graph partitioning problem and many efficient algorithms have been developed. Significant progress has also been made in the development of dynamic load balancing algorithms. This paper looks at the history and the state of the art of both classes of algorithms, with a particular emphasis on mesh based applications. However the underlying algorithms, including those for graph partitioning and flow calculation, are sufficiently generic to be applicable to other applications.
The paper presents an overview of parallel computing models, architectures, and research projects that are based on asynchronous instruction scheduling. It starts with pure dataflow computing models and presents an historical development of several ideas (i.e. single-token-per-arc dataflow, tagged-token dataflow, explicit token store, threaded dataflow, large-grain dataflow, RISC dataflow, cycle-by-cycle interleaved multithreading, block interleaved multithreading, simultaneous multithreading) that resulted in modern multithreaded superscalar processors. The paper shows that unification of von Neumann and dataflow models is possible and preferred to treating them as two unrelated, orthogonal computer paradigms. Today's dataflow research incorporates more explicit notions of state into the architecture, and von Neumann models using many dataflow techniques to improve the latency hiding aspects of modern multi threaded systems.