Empirical Parallel Performance Prediction From Semantics-Based Profiling


Norman Scaife
Greg Michaelson
Susumu Horiguchi


The PMLS parallelizing compiler for Standard ML is based upon the automatic
instantiation of algorithmic skeletons at sites of higher order function (HOF)
use. Rather than mechanically replacing HOFs with skeletons, which in general
leads to poor parallel performance, PMLS seeks to predict run-time parallel
behaviour to optimise skeleton use.

Static extraction of analytic cost models from programs is undecidable,
and practical heuristic approaches are intractable. In contrast,
PMLS utilises a hybrid approach by combining static analytic cost models for
skeletons with dynamic information gathered from the sequential
instrumentation of HOF argument functions. Such instrumentation is provided
by an implementation independent SML interpreter, based on the language's
Structural Operational Semantics (SOS), in the form of SOS rule counts.
PMLS then tries to relate the rule counts to program execution times through
numerical techniques.

This paper considers the design and implementation of the PMLS approach
to parallel performance prediction. The formulation of a general rule count
cost model as a set of over-determined linear equations is discussed, and
their solution by single value decomposition, and by a genetic algorithm, are


Special Issue