References

CONVERGENCE OF THE VALUE ITERATION METHOD FOR BELLMAN OPTIMAL PROBLEM AND APPLICATIONS

[1] R. Chopra and S. S. Roy, End-to-end reinforcement learning for self-driving car, Advanced Computing and Intelligent Engineering (2020), 53-61.
DOI: https://doi.org/10.1007/978-981-15-1081-6_5

[2] T. Haarnoja, A. Zhou, P. Abbeel and S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, ArXiv: 1801.01290 (2018).
DOI: https://doi.org/10.48550/arXiv.1801.01290

[3] A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J. M. Allen, V.-D. Lam, A. Bewley and A. Shah, Learning to drive in a day, In 2019 International Conference on Robotics and Automation (ICRA) (2018), 8248-8254.
DOI: https://doi.org/10.48550/arXiv.1807.00412

[4] J. Li, W. Monroe, A. Ritter, D. Jurafsky, M. Galley and J. Gao, Deep reinforcement learning for dialogue generation, In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1192-1202, Austin, Texas. Association for Computational Linguistics.
DOI: https://doi.org/10.18653/v1/D16-1127

[5] Z. Liang, H. Chen, J. Zhu, K. Jiang and Y. Li, Adversarial deep reinforcement learning in portfolio management, ArXiv: 1808.09940, (2018).
DOI: https://doi.org/10.48550/arXiv.1808.09940

[6] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. M. O. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, Continuous control with deep reinforcement learning, ArXiv: 1509.02971, (2015).
DOI: https://doi.org/10.48550/arXiv.1509.02971

[7] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver and K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, in International Conference on Machine Learning (2016), 1928-1937.
DOI: https://doi.org/10.48550/arXiv.1602.01783

[8] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller, Playing Atari with deep reinforcement learning, ArXiv: 1312.5602 (2013).
DOI: https://doi.org/10.48550/arXiv.1312.5602

[9] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis, Human-level control through deep reinforcement learning, Nature 518 (2015), 529-533.
DOI: https://doi.org/10.1038/nature14236

[10] J. Moody and M. Saffell, Learning to trade via direct reinforcement, IEEE Transactions on Neural Networks 12(4) (2001), 875-889.
DOI: https://doi.org/10.1109/72.935097

[11] K. Narasimhan, T. Kulkarni and R. Barzilay, Language understanding for text-based games using deep reinforcement learning, In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1-11, Lisbon, Portugal. Association for Computational Linguistics.
DOI: https://doi.org/10.18653/v1/D15-1001

[12] Y. Nevmyvaka, Y. Feng and M. Kearns, Reinforcement learning for optimized trade execution, Proceedings of the 23rd International Conference on Machine Learning (2006), 673-680.
DOI: https://doi.org/10.1145/1143844.1143929

[13] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel and D. Hassabis, Mastering the game of go with deep neural networks and tree search, Nature 529 (2016), 484-489.
DOI: https://doi.org/10.1038/nature16961

[14] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra and M. Riedmiller, Deterministic policy gradient algorithms, Proceedings of the 31st International Conference on International Conference on Machine Learning 32 (2024), 387-395.

[15] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, IEEE Transactions on Neural Networks 9(5) (1998), 1054-1054.
DOI: https://doi.org/10.1109/TNN.1998.712192

[16] R. S. Sutton, D. McAllester, S. Singh and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, Proceedings of the 12th International Conference on Neural Information Processing Systems (2000), Pages 1057-1063.

[17] C. J. C. H. Watkins and P. Dayan, Q-Learning, Machine Learning 8 (1992), 279-292.
DOI: https://doi.org/10.1007/BF00992698

[18] Z. Yang, Y. Xie and Z. Wang, A theoretical analysis of deep q-learning, ArXiv: 1901.00137 (2019).

References

CONVERGENCE OF THE VALUE ITERATION METHOD FOR BELLMAN OPTIMAL PROBLEM AND APPLICATIONS

Journal Menu