文章基本信息

标题：Model-Free Reinforcement Learning for Stochastic Parity Games
本地全文：下载
作者：Ernst Moritz Hahn ; Mateo Perez ; Sven Schewe 等
期刊名称：LIPIcs : Leibniz International Proceedings in Informatics
电子版ISSN：1868-8969
出版年度：2020
卷号：171
页码：21:1-21:16
DOI：10.4230/LIPIcs.CONCUR.2020.21
出版社：Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
摘要：This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter Îµ, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter Îµ tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 1 1/2-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions.
关键词：Reinforcement learning; Stochastic games; Omega-regular objectives