PDF, 1.3 MB
Zipped PostScript, 1.4 MB
HTML
PDF, 313 KB
Zipped PostScript, 511 KB
We show in this paper how to evaluate the performance of pipeline-structured parallel programs with skeletons and process algebra. Since many applications follow some commonly used algorithmic skeletons, we identify such skeletons and model them with process algebra in order to get relevant information about the performance of the application, and to be able to take good scheduling decisions. This concept is illustrated through the case study of the pipeline skeleton, and a tool which generates automatically a set of models and solves them is presented. Some numerical results are provided, proving the efficacy of this approach.
PDF, 263 KB
Zipped PostScript, 505 KB
Camelot is a resource-bounded functional programming language which compiles to Java byte code to run on the Java Virtual Machine. We extend Camelot to include language support for Camelot-level threads which are compiled to native Java threads. We extend the existing Camelot resource-bounded type system to provide safety guarantees about the heap usage of Camelot threads. We demonstrate the usefulness of our concurrency extensions to the language by implementing a multi-threaded graphical network chat application which could not have been expressed as naturally in the sequential, object-free sublanguage of Camelot which was previously available.
PDF, 246 KB
Zipped PostScript, 595 KB
This paper describes the Expressive Velocity Engine library, an object oriented C++ library designed to ease the process of writing efficient numerical applications using AltiVec, the SIMD extension designed by Apple, Motorola and IBM. AltiVec-powered applications typically show off a relative speed up of 4 to 16 but need a complex and awkward programmation style. By using various template metaprogramming techniques, E.V.E. provides an easy to use, STL-like, interface that allows developer to quickly write efficient and easy to read code. Typical applications written with E.V.E. can benefit from a large fraction of theorical maximum speed up while being written as simple C++ arithmetic code.
PDF, 544 KB
Zipped PostScript, 607 KB
A functional data-parallel language called BSML was designed for programming Bulk-Synchronous Parallel algorithms, a model of computing which allows parallel programs to be ported to a wide range of architectures. BSML is based on an extension of the ML language with parallel operations on a parallel data structure called parallel vector. The execution time can be estimated. Dead-locks and indeterminism are avoided. For large scale applications where parallel processing is helpful and where the total amount of data often exceeds the total main memory available, parallel disk I/O becomes a necessity. In this paper, we present a library of I/O features for BSML and its formal semantics. A cost model is also given and some preliminary performance results are shown for a commodity cluster.
PDF, 266 KB
Zipped PostScript, 614 KB
We propose to use high-level Petri nets as a model for the semantics of high-level parallel systems. This model is known to be useful for the purpose of verification and we show that it is also executable in a parallel way. Executing a Petri net is not difficult in general but more complicated in a timed context, which makes necessary to synchronise the internal time of the Petri net with the real time of its environment. Another problem is to relate the execution of a Petri net, which has its own semantics, to that of its environment; i.e., to properly handle input/output.
This paper presents a parallel algorithm to execute Petri nets with time, enforcing the even progression of internal time with respect to that of the real time and allowing the exchange of information with the environment. We define a class of Petri nets suitable for a parallel execution machine which preserves the step sequence semantics of the nets and ensures time consistent executions while taking into account the solicitation of its environment. The question of the efficient verification of such nets has been addressed in a separate paper (see [ Causal Time Calculus, FORMATS'03. LNCS 2791, Springer, 2004.]), the present one is more focused on the practical aspects involved in the execution of so modelled systems.
PDF, 245 KB
Zipped PostScript, 525 KB
The use of agent based services in a Computational Grid is outlined—along with particular roles that these agents undertake. Reasons why agents provide the most natural abstraction for managing and supporting Grid services is also discussed. Agent services are divided into two broad categories: (1) infrastructure services, and (2) application services. Infrastructure services are provided by existing Grid management systems, such as Globus and Legion, and application services by intelligent agents. Usage scenarios are provided to demonstrate the concepts involved.
PDF, 266 KB
Zipped PostScript, 722 KB
One common assumption of existing models of load balancing is that the weights of resources and I/O buffer size are statically configured and cannot be adjusted based on a dynamic workload. Though the static configuration of these parameters performs well in a cluster where the workload can be modeled and predicted, its performance is poor in dynamic systems in which the workload is unknown. In this paper, a new feedback control mechanism is proposed to improve overall performance of a cluster with a general and practical workload including I/O-intensive and memory-intensive load. This mechanism is also shown to be effective in complementing and enhancing the performance of a number of existing dynamic load-balancing schemes. To capture the current and past workload characteristics, the primary objectives of the feedback mechanism are: (1) dynamically adjusting the resource weights, which indicate the significance of the resources, and (2) minimizing the number of page faults for memory-intensive jobs while increasing the utilization of the I/O buffers for I/O-intensive jobs by manipulating the I/O buffer size. Results from extensive trace-driven simulation experiments show that compared with a number of schemes with fixed resource weights and buffer sizes, the feedback control mechanism delivers a performance improvement in terms of the mean slowdown by up to 282% (with an average of 125%).