首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Checkpointing Strategies for Scheduling Computational Workflows
  • 本地全文:下载
  • 作者:Guillaume Aupy ; Anne Benoit ; Henri Casanova
  • 期刊名称:International Journal of Networking and Computing
  • 印刷版ISSN:2185-2847
  • 出版年度:2016
  • 卷号:6
  • 期号:1
  • 页码:2-26
  • 语种:English
  • 出版社:International Journal of Networking and Computing
  • 摘要:We study the scheduling of computational workflows on compute resources that experience exponentially distributed failures. When a failure occurs, rollback and recovery is used to resume the execution from the last checkpointed state. The scheduling problem is to minimize the expected execution time by deciding in which order to execute the tasks in the workflow and deciding for each task whether to checkpoint it or not after it completes. We give a polynomial-time optimal algorithm for fork DAGs (Directed Acyclic Graphs) and show that the problem is NP-complete with join DAGs. We also investigate the complexity of the simple case in which no task is checkpointed. Our main result is a polynomial-time algorithm to compute the expected execution time of a workflow, with a given task execution order and specified to-be-checkpointed tasks. Using this algorithm as a basis, we propose several heuristics for solving the scheduling problem. We evaluate these heuristics for representative workflow configurations.Â
  • 其他摘要:We study the scheduling of computational workflows on compute resources that experience exponentially distributed failures. When a failure occurs, rollback and recovery is used to resume the execution from the last checkpointed state. The scheduling problem is to minimize the expected execution time by deciding in which order to execute the tasks in the workflow and deciding for each task whether to checkpoint it or not after it completes. We give a polynomial-time optimal algorithm for fork DAGs (Directed Acyclic Graphs) and show that the problem is NP-complete with join DAGs. We also investigate the complexity of the simple case in which no task is checkpointed. Our main result is a polynomial-time algorithm to compute the expected execution time of a workflow, with a given task execution order and specified to-be-checkpointed tasks. Using this algorithm as a basis, we propose several heuristics for solving the scheduling problem. We evaluate these heuristics for representative workflow configurations.Â
  • 关键词:checkpointing;scheduling;workflow;fault-tolerance;reliability
国家哲学社会科学文献中心版权所有