Abstracts Volume 4, No. 1, March 2001

SPECIAL ISSUE PAPERS

The Development of Conservative Superstep Protocols for Shared Memory Multiprocessor Systems
Boon-Ping Gan, Yoke-Hean Low, Wentong Cai, Stephen J. Turner, Sanjay Jain, Wen Jing Hsu, and Shell Ying Huang

 This paper summarizes our work involving the successive refinements of a conservative superstep protocol. The refined protocols are known as the SST, SLS, and FET protocols. Each of the refinements improves the safetime bound, which in turn reduces the number of supersteps. In general, this helps to reduce the synchronization overhead and thus the execution time. However, a change in the number of supersteps can also introduce load imbalances within supersteps into the simulation. This observation becomes apparent with the FET protocol in particular. The performance of the refined protocols is compared through several semiconductor supply-chain simulation models. The results from the experiments strongly indicate the feasibility of using parallel discrete-event simulation techniques in the simulation of large scale systems such as a supply-chain.


Implementation of a Virtual Time Synchronizer for Distributed Databases on a Cluster of Workstations
A. Boukerche, S. K. Das, A. Datta, and T. E. LeMaster

 The availability of high speed networks and improved microprocessor performances has made it possible to build inexpensive cluster (or network) of workstations as an appealing vehicle for parallel and distributed computing. In this paper, we study the performance of a distributed database system synchronized, by virtual time (VT) mechanism by experimenting on a LAN connected collection of Sun SPARC workstations. To the best of our knowledge, ours is the first attempt to report the virtual time performance in distributed databases using a cluster of workstations. The experimental results demonstrate that the VT synchronization approach is a viable concurrency control method. Compared to a widely used multiversion (MV) concurrency control scheme based on timestamps order, the VT synchronization scheme is shown to yield about 25-30% reduction in the response time. The reason for choosing multiversion concurrency control as the basis for comparison is that both VT and MV protocols are based upon the concept of timestamps, and attain high performance by using multiple version of data objects. (The term "version" refers to the value of data item produced by a transaction that's either active or committed.)


Applying Multilevel Partitioning to Parallel Logic Simulation
Swaminathan Subramanian, Dhananjai M. Rao, and Philip A. Wilsey

 The size and complexity of hardware systems motivates the use of simulation for their study and analysis. Parallelization techniques are often employed to meet the memory and computational requirements for simulating large hardware designs. Furthermore, partitioning the design for parallel simulation is vital for achieving acceptable simulation throughput. This paper presents the design and implementation of a new partitioning algorithm based on a multilevel partitioning heuristic. The multilevel algorithm attempts to balance the load, maximize concurrency, and reduce inter-processor communication in three phases to improve performance of the parallel simulator. The paper presents the algorithm and reports results from of our empirical studies to compare performance of the multilevel algorithm with other partitioning techniques. A generic partitioning infrastructure was developed and integrated into the WARPED parallel simulation framework to ease design, implementation, and study of different partitioning algorithms. The design issues involved in the development of the generic partitioning infrastructure are discussed and the techniques employed to integrate several partitioning algorithms into this framework are also presented in the paper. The experimental results obtained from our benchmarks indicate that the multilevel algorithm yields better partitions than other partitioning algorithms included in the study.


Self-Organized Criticality in Optimistic Simulation of Correlated Systems
B. J. Overeinder, A. Schoneveld, and P. M. A. Sloot

A variety of slowly driven diffusive systems have been shown to self organize into a critical state. In this paper we study the dynamic runtime behavior of the optimistic parallel Time Warp simulation method. The method is based on an asynchronous execution of timed events. The basic problem is the out of order execution of events. A causality error results in so-called rollback of processed events until the causality error is resolved. By using the Ising spin model we show experimentally that the distribution of number of rolled back events behaves as a power-law distribution over a large range of sub-critical Ising temperatures and decays exponentially above for super-critical Ising temperatures. For critical Ising temperatures, the computational complexity of Time Warp and physical complexity of the Ising spin model are entangled and contribute both to the runtime behavior in a non-linear way.


Some Ownership Management Issues in Distributed Simulation using HLA/RTI
Farshad Moradi, Rassul Ayani, and Gary Tan

 To study the High Level Architecture (HLA) and the services that are provided by the Runtime Infrastructure (RTI), in particular Object Management and Ownership Management, we have developed a distributed air traffic controller simulator. In our simulation model, each airport is represented by a federate and controls a number of aircraft. The control of the aircraft is transferred among airports as the aircraft fly to different airports. In this paper we discuss two different approaches that we have used to facilitate the exchange of ownership of aircraft attributes among federates, namely the pull and the negotiated push method. We present a comparison of the two methods and also discuss the problems associated with each method and our approach to resolving them. These problems include the oscillation effect, which causes aircraft attributes to be pulled back and forth between federates, and the pending attribute acquisition requests, which result in loss of aircraft (or unattended aircraft attributes). We have experienced that in such scenarios, the push method is more efficient and accurate. We also present our experiences and observations from our experimentation with the RTI. We have noticed some shortcomings in the current RTI interface specification that we will discuss in the paper.


RESEARCH PAPERS

Communication Overhead on Distributed Memory Machines
Shiping Chen and Jingling Xue

 Distributed memory machines (DMMs) can provide scalable and flexible high performance if the relatively high startup overhead on these machines can be amortised. This paper studies communication overhead on DMMs. Based on LogP, a communication cost model is developed to quantify separately the effects of send, receive and network contention on overall communication overhead. This cost model is applied to estimate communication overhead for four commercial multicomputer systems. Two observations for send and receive overheads on DMMs are discussed.

Rank Order Filtering on Massively-Parallel Single-Bit Mesh Processor Arrays
Hongchi Shi and Hongzheng Li

 Rank order filters have a wide variety of applications in image processing as an effective tool for removing noise in images without severely distorting abrupt changes in the images. The k-th rank filter with an m x m window sets each pixel the k-th smallest value of the m2 pixels in its m x m neighborhood. The median filter is the most commonly used special case of rank order filters. Rank order filtering requires intensive computation. In this paper, we consider implementation of rank order filters on massively-parallel single-bit mesh-connected computers such as the Lockheed Martin Parallel Algebraic Logic (PAL) computer.

We design rank filtering algorithms using the threshold decomposition and radix splitting techniques. We map an n x n image onto an n x n single-bit mesh array with one pixel per processing element (PE). Suppose that the maximum pixel value is L. Using threshold decomposition technique, we can implement any rank filter in O((m log m + log L)L) time on a mesh processor array with O(log L + log m) bits of space on each PE. Using the radix splitting technique, we can implement any rank order filter in O(m2 log m log L ) time on a mesh processor array with O(log L + m2) bits of space on each PE. If the local space in each PE is limited to O(log L + log m) bits, we can have an O(m2 (log L + log m) log L) implementation. We also present some experimental results of the implementation of those algorithms on the PAL computer.