In this paper we study the problem of optimizing such applications under a broad model that incorporates not only communication overheads but also the existence of local data caches that could exist as a result of previous queries. We study the cases of both 1- and N-port communication setups. Our analytical approach is not only complimented by a theorem that shows how to arrange the sequence of operations in order to minimize the overall cost, but also yields closed-form solutions to the partitioning problem.
For the case where large load imbalances (due to big differences in cache sizes) prevent the calculation of a closed-form solution, we propose an algorithm for optimizing load redistribution.
The paper is concluded by a simulation study that evaluates the impact of our analytical approach. The simulation, which assumes a homogeneous parallel platform for easy interpretation of the results, compares the characteristics of the 1- and N-port setups.