Communication-aware Approaches for Transparent Checkpointing in Cloud Computing

Samy
 Sadi; Belabbas
 Yagoubi

doi:10.12694/scpe.v17i3.1184

PDF

Published: Aug 1, 2016

DOI: https://doi.org/10.12694/scpe.v17i3.1184

Samy Sadi

Belabbas Yagoubi

Abstract

Checkpoint/Restart or checkpointing is a fault tolerance technique which consists on taking frequent snapshots of an application, so that, in the event of a failure, the application's state can be restored and the application's execution continued without necessarily restarting it. The advent of Cloud Computing brought new challenges with regard to this technique as Fault Tolerance needs to be supplied transparently in environments running highly heterogeneous applications. In this context, we propose two new fully transparent checkpointing approaches. Both approaches use communication-induced checkpointing and guarantee a consistent view of the applications with regard to the outside world process. The first approach is uncoordinated and creates checkpoints for applications independently. The second approach is coordinated, and applications are first grouped into clusters before the checkpointing process is started. We have compared the proposed approaches with state of the art approaches. The results show that our approaches perform better when considering the communication latencies, and the overhead on the execution of the Virtual Machines.

Issue

Vol. 17 No. 3 (2016)

Section

Research Papers

Article Sidebar

Main Article Content

Abstract

Article Details