Exact and Heuristic Data Workflow Placement Algorithms for Big Data Computing in Cloud Datacenters

Sonia Ikken; Eric Renault; Abdelkamel Tari; Tahar Kechadi

doi:10.12694/scpe.v19i3.1365

Authors

Sonia Ikken Faculty of Exact Sciences, University of Bejaia, 06000 Bejaia, Algeria, and Telecom SudParis, Samovar-UMR 5157 CNRS, University of Paris-Saclay, France
Eric Renault Telecom SudParis, Samovar-UMR 5157 CNRS, University of Paris-Saclay, France
Abdelkamel Tari Faculty of Exact Sciences, University of Bejaia, 06000 Bejaia, Algeria
Tahar Kechadi UCD School of Computer Science and Informatics, Dublin, Ireland

DOI:

https://doi.org/10.12694/scpe.v19i3.1365

Abstract

Several big data-driven applications are currently carried out in collaboration using distributed infrastructure. These data-driven applications usually deal with experiments at massive scale. Data generated by such experiments are huge and stored at multiple geographic locations for reuse. Workflow systems, composed of jobs using collaborative task-based models, present new dependency and data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of applications is on time, and resource usage-cost-efficient. In this paper, we present an efficient data placement approach to improve the performance of workflow processing in distributed data centres. The proposed approach involves two types of data: splittable and unsplittable intermediate data. Moreover, we place intermediate data by considering not only their source location but also their dependencies. The main objective is to minimise the total storage cost, including the effort for transferring, storing, and moving that data according to the applications needs. We first propose an exact algorithm which takes into account the intra-job dependencies, and we show that the optimal fractional intermediate data placement problem is NP-hard. To solve the problem of unsplittable intermediate data placement, we propose a greedy heuristic algorithm based on a network flow optimisation framework. The experimental results show that the performance of our approach is very promising. We also show that even with divergent conditions, the cost ratio of the heuristic approach is close to the optimal solution.

Exact and Heuristic Data Workflow Placement Algorithms for Big Data Computing in Cloud Datacenters

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

announcement

Indexed In

SUBMIT

Metrics

Journal Information