期刊名称:International Journal of Electronics, Communication and Soft Computing Science and Engineering
印刷版ISSN:2277-9477
出版年度:2015
卷号:4
期号:Special 2
出版社:IJECSCSE
摘要:Nowadays there is need of high performance ofcomputer system in distributed environment. As the system meantime before failure correspondingly drops, applications mustcheckpoint frequently to make progress. However, at scale, the costof checkpointing becomes prohibitive. A solution to this problem ismultilevel checkpointing, which employs multiple types ofcheckpoints in a single run. Lightweight checkpoints can handle themost common failure modes, while more expensive checkpoints canhandle severe failures. Also uses the designed of multilevelcheckpointing library, the Scalable Checkpoint/Restart (SCR)library[1], that writes lightweight checkpoints to node-local storagein addition to the parallel file system, which present probabilisticMarkov models of SCRs performance. The proposed work focuseson evaluation of multiple checkpointing in the distributedenvironment in the presence of multiple senders and multiplereceiver.