Author's: Xuanyu Liu, Shao Huang, Mengqiu Fan, Xin Yin, Yaodong Zhao and Liming Zhou
Pages: [39] - [57]
Received Date: August 15, 2024
Submitted by:
DOI: http://dx.doi.org/10.18642/jmsaa_7100122308
Reinforcement Learning (RL) has emerged as a widely applicable and effective paradigm for addressing decision-making problems across diverse domains. RL encapsulates decision-making problems within the framework of Markov decision processes (MDPs), with the resolution of the Bellman optimal problem (BOP) being the quintessential endeavor. We present a rigorous proof of the convergence for the value iteration method of BOP, highlighting its exponential rate. Building upon this foundation, we introduce two novel acceleration techniques-transition set and multiple step update-that enhance the efficiency of Q-Learning and Deep Q-Networks. Our extensive numerical experiments confirm the effectiveness of these techniques.
reinforcement learning, Bellman optimal problem, value iteration, convergence analysis, acceleration techniques.