首页    期刊浏览 2025年03月03日 星期一
登录注册

文章基本信息

  • 标题:Policy Gradients with Memory-Augmented Critic 微分可能メモリを用いた方策オフ型方策勾配法の安定化手法
  • 本地全文:下载
  • 作者:妹尾 卓磨 ; 今井 倫太
  • 期刊名称:人工知能学会論文誌
  • 印刷版ISSN:1346-0714
  • 电子版ISSN:1346-8030
  • 出版年度:2021
  • 卷号:36
  • 期号:1
  • 页码:1-8
  • DOI:10.1527/tjsai.36-1_B-K71
  • 出版社:The Japanese Society for Artificial Intelligence
  • 摘要:Deep reinforcement learning has been investigated in high-dimensional continuous control tasks. Deep Deterministic Policy Gradients (DDPG) is known as a highly sample-efficient policy gradients algorithm. However, it is reported that DDPG is unstable during training due to bias and variance problems of learning its action-value function. In this paper, we propose Policy Gradients with Memory Augmented Critic (PGMAC) that builds action-value function with the memory module previously proposed as Differentiable Neural Dictionary (DND). Although the DND is only studied in discrete action-space problems, we propose Action-Concatenated Key, which is a technique to combine DDPG-based policy gradient methods and DND. Furthermore, the remarkable advantage of PGMAC is shown that long-term reward calculation and weighted summation of value estimation at DND has an essential mechanism to solve the bias and variance problem. In experiment, PGMAC significantly outperformed baselines in continuous control tasks. The effects of hyperparameters were also investigated to show that the memory-augmented action-value function reduces the bias and variance in policy optimization.
  • 关键词:deep reinforcement learning;policy gradients;memory module;continuous control;deep reinforcement learning
国家哲学社会科学文献中心版权所有