首页    期刊浏览 2025年02月28日 星期五
登录注册

文章基本信息

  • 标题:Containment Domains: A Scalable, Efficient and Flexible Resilience Scheme for Exascale Systems
  • 本地全文:下载
  • 作者:Jinsuk Chung, Ikhwan Lee, Michael Sullivan, Jee Ho Ryoo, Dong Wan Kim, Doe Hyun Yoon, Larry Kaplan ; Mattan Erez
  • 期刊名称:Scientific Programming
  • 印刷版ISSN:1058-9244
  • 出版年度:2013
  • 卷号:21
  • DOI:10.3233/SPR-130374
  • 出版社:Hindawi Publishing Corporation
  • 摘要:This paper describes and evaluates a scalable and efficient resilience scheme based on the concept of containment domains. Containment domains are a programming construct that enable applications to express resilience needs and to interact with the system to tune and specialize error detection, state preservation and restoration, and recovery schemes. Containment domains have weak transactional semantics and are nested to take advantage of the machine and application hierarchies and to enable hierarchical state preservation, restoration and recovery. We evaluate the scalability and efficiency of containment domains using generalized trace-driven simulation and analytical analysis and show that containment domains are superior to both checkpoint restart and redundant execution approaches.
国家哲学社会科学文献中心版权所有