摘要:AbstractThe Multi-Arm bandit(MAB) problem is a classical problem in the field of reinforcement learning with only one state. The Grid problem is a multi-state problem for reinforcement learning. In this work, we focus on how to combine the classical value function method to quantum computation, and we propose three novel quantum reinforcement learning(QRL) algorithms for the MAB problem and one novel QRL algorithm, which is combined with the quantum random walk and Grover algorithm, for the Grid problem. From the experiments, the learning process is speed-up by combining the value function with quantum computation.