Empirical Parallel Performance Prediction From Semantics-Based Profiling
Abstract
The PMLS parallelizing compiler for Standard ML is based upon the automatic instantiation of algorithmic skeletons at sites of higher order function (HOF) use. Rather than mechanically replacing HOFs with skeletons, which in general leads to poor parallel performance, PMLS seeks to predict run-time parallel behaviour to optimise skeleton use.Static extraction of analytic cost models from programs is undecidable, and practical heuristic approaches are intractable. In contrast, PMLS utilises a hybrid approach by combining static analytic cost models for skeletons with dynamic information gathered from the sequential instrumentation of HOF argument functions. Such instrumentation is provided by an implementation independent SML interpreter, based on the language's Structural Operational Semantics (SOS), in the form of SOS rule counts. PMLS then tries to relate the rule counts to program execution times through numerical techniques.
This paper considers the design and implementation of the PMLS approach to parallel performance prediction. The formulation of a general rule count cost model as a set of over-determined linear equations is discussed, and their solution by single value decomposition, and by a genetic algorithm, are presented.