Checkpoint and Replication Oriented Fault Tolerant Mechanism for MapReduce Framework

MapReduce is an emerging programming paradigm and an associated implementation for processing and generating big data which has been widely applied in data-intensive systems. In cloud environment, node and task failure is no longer accidental but a common feature of large-scale systems. In MapReduce...

Full description

Saved in:
Bibliographic Details
Main Authors: Liu, Yang (Author), Wei, Wei (Author), Zhang, Yuhong (Author)
Format: EJournal Article
Published: Institute of Advanced Engineering and Science, 2014-02-01.
Subjects:
Online Access:Get fulltext
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:MapReduce is an emerging programming paradigm and an associated implementation for processing and generating big data which has been widely applied in data-intensive systems. In cloud environment, node and task failure is no longer accidental but a common feature of large-scale systems. In MapReduce framework, although the rescheduling based fault-tolerant method is simple to implement, it failed to fully consider the location of distributed data, the computation and storage overhead. Thus, a single node failure will increase the completion time dramatically. In this paper, a Checkpoint and Replication Oriented Fault Tolerant scheduling algorithm (CROFT) is proposed, which takes both task and node failure into consideration. Preliminary experiments show that with less storage and network overhead.CROFT will significantly reduce the completion time at failure time, and the overall performance of MapReduce can be improved at least over 30% than original mechanism in Hadoop. DOI : http://dx.doi.org/10.11591/telkomnika.v12i2.4324