期刊名称:International Journal of Electronics, Communication and Soft Computing Science and Engineering
印刷版ISSN:2277-9477
出版年度:2014
卷号:3
期号:7
出版社:IJECSCSE
摘要:Nowadays there is need of high performance of computer system in distributed environment. As the system mean time before failure correspondingly drops, applications must checkpoint frequently to make progress. However, at scale, the cost of checkpointing becomes prohibitive. A solution to this problem is multilevel checkpointing, which employs multiple types of checkpoints in a single run. Lightweight checkpoints can handle the most common failure modes, while more expensive checkpoints can handle severe failures. Also uses the designed of multilevel checkpointing library, the Scalable Checkpoint/Restart (SCR) library[1], that writes lightweight checkpoints to node-local storage in addition to the parallel file system, which present probabilistic Markov models of SCRs performance. The proposed work focuses on evaluation of multiple checkpointing in the distributed environment in the presence of multiple senders and multiple receiver