[1] R. Chopra and S. S. Roy, End-to-end reinforcement learning for
self-driving car, Advanced Computing and Intelligent Engineering
(2020), 53-61.
DOI: https://doi.org/10.1007/978-981-15-1081-6_5
[2] T. Haarnoja, A. Zhou, P. Abbeel and S. Levine, Soft actor-critic:
Off-policy maximum entropy deep reinforcement learning with a
stochastic actor, ArXiv: 1801.01290 (2018).
DOI: https://doi.org/10.48550/arXiv.1801.01290
[3] A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J. M. Allen,
V.-D. Lam, A. Bewley and A. Shah, Learning to drive in a day, In 2019
International Conference on Robotics and Automation (ICRA) (2018),
8248-8254.
DOI: https://doi.org/10.48550/arXiv.1807.00412
[4] J. Li, W. Monroe, A. Ritter, D. Jurafsky, M. Galley and J. Gao,
Deep reinforcement learning for dialogue generation, In Proceedings of
the 2016 Conference on Empirical Methods in Natural Language
Processing, pages 1192-1202, Austin, Texas. Association for
Computational Linguistics.
DOI: https://doi.org/10.18653/v1/D16-1127
[5] Z. Liang, H. Chen, J. Zhu, K. Jiang and Y. Li, Adversarial deep
reinforcement learning in portfolio management, ArXiv: 1808.09940,
(2018).
DOI: https://doi.org/10.48550/arXiv.1808.09940
[6] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. M. O. Heess, T. Erez,
Y. Tassa, D. Silver and D. Wierstra, Continuous control with deep
reinforcement learning, ArXiv: 1509.02971, (2015).
DOI: https://doi.org/10.48550/arXiv.1509.02971
[7] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T.
Harley, D. Silver and K. Kavukcuoglu, Asynchronous methods for deep
reinforcement learning, in International Conference on Machine
Learning (2016), 1928-1937.
DOI: https://doi.org/10.48550/arXiv.1602.01783
[8] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D.
Wierstra and M. Riedmiller, Playing Atari with deep reinforcement
learning, ArXiv: 1312.5602 (2013).
DOI: https://doi.org/10.48550/arXiv.1312.5602
[9] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S.
Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D.
Wierstra, S. Legg and D. Hassabis, Human-level control through deep
reinforcement learning, Nature 518 (2015), 529-533.
DOI: https://doi.org/10.1038/nature14236
[10] J. Moody and M. Saffell, Learning to trade via direct
reinforcement, IEEE Transactions on Neural Networks 12(4) (2001),
875-889.
DOI: https://doi.org/10.1109/72.935097
[11] K. Narasimhan, T. Kulkarni and R. Barzilay, Language
understanding for text-based games using deep reinforcement learning,
In Proceedings of the 2015 Conference on Empirical Methods in Natural
Language Processing, pages 1-11, Lisbon, Portugal. Association for
Computational Linguistics.
DOI: https://doi.org/10.18653/v1/D15-1001
[12] Y. Nevmyvaka, Y. Feng and M. Kearns, Reinforcement learning for
optimized trade execution, Proceedings of the 23rd International
Conference on Machine Learning (2006), 673-680.
DOI: https://doi.org/10.1145/1143844.1143929
[13] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van
den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M.
Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I.
Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel and D.
Hassabis, Mastering the game of go with deep neural networks and tree
search, Nature 529 (2016), 484-489.
DOI: https://doi.org/10.1038/nature16961
[14] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra and M.
Riedmiller, Deterministic policy gradient algorithms, Proceedings of
the 31st International Conference on International Conference on
Machine Learning 32 (2024), 387-395.
[15] R. S. Sutton and A. G. Barto, Reinforcement learning: An
introduction, IEEE Transactions on Neural Networks 9(5) (1998),
1054-1054.
DOI: https://doi.org/10.1109/TNN.1998.712192
[16] R. S. Sutton, D. McAllester, S. Singh and Y. Mansour, Policy
gradient methods for reinforcement learning with function
approximation, Proceedings of the 12th International Conference on
Neural Information Processing Systems (2000), Pages 1057-1063.
[17] C. J. C. H. Watkins and P. Dayan, Q-Learning, Machine Learning 8
(1992), 279-292.
DOI: https://doi.org/10.1007/BF00992698
[18] Z. Yang, Y. Xie and Z. Wang, A theoretical analysis of deep
q-learning, ArXiv: 1901.00137 (2019).