摘要:AbstractA risk-averse preview-based Q-learning planner is presented for navigation of autonomous vehicles. To this end, the multi-lane road ahead of a vehicle is represented by a finite-state non-stationary Markov decision process (MDP). A sampling-based risk-averse preview-based Q-learning algorithm is finally developed that generates samples using the preview information and reward function to learn risk-averse optimal planning strategies without actual interaction with the environment. The risk factor is imposed on the objective function to avoid fluctuation of the Q values, which can jeopardize the vehicle's safety and/or performance. Theoretical results are provided to bound the number of samples required to guarantee ϵ-optimal planning with a high probability. Finally, to verify the efficiency of the presented algorithm, its implementation on highway driving of an autonomous vehicle in a varying traffic density is considered.