Applying the machine learning methods in the agent behavior control

Authors: Fedotov M.A., Chapaev A.Yu.
Published in issue: #11(88)/2023
DOI: 10.18698/2541-8009-2023-11-952
Category: Informatics, Computer Engineering and Control \| Chapter: Information Technology. Computer techologies. Theory of computers and systems
Keywords: artificial neural network, machine learning methods, reinforcement learning, Double Deep Q-learning, optimization algorithm, agent control, hyper-parameters, learning rate
Published: 19.12.2023

The paper is devoted to introducing the machine learning methods in agent control. It considers the reinforcement learning method. The following reinforcement learning algorithms are compared: Q-learning, SARSA, EV-SARSA, and DDQN. DDQN appears to be the most suitable algorithm in controlling the agent behavior in the non-deterministic environment. The DDQN algorithm is implemented in the C++ programming language. Developed implementation of the machine learning method is used to control the agent in the Zmeyka gaming application. Computational experiments are presented to study effectiveness of the developed machine learning method in controlling the agent behavior. Experiments demonstrate the benefits of using DDQN under the changing environmental conditions, which confirms the algorithm effectiveness in solving problems of the agent behavior control.

References

[1] Kozov A.V. Comparing the efficiency of some modifications of the evolutionary strategy algorithm. Politekhnicheskiy molodezhnyy zhurnal, 2018, no. 5 (22). (In Russ.). http://dx.doi.org/10.18698/2541-8009-2018-5-309

[2] Vorontsov K. Matematicheskie metody obucheniya po pretsedentam (teoriya obuche-niya mashin) [Mathematical methods of learning by precedent (theory of machine learning).]. URL: http://www.machinelearning.ru/wiki/images/6/6d/Voron-ML-1.pdf (accessed October 15, 2023).

[3] Sutton R.S., Barto A.G. Reinforcement Learning: An Introduction. London, MIT Press, 1998, pp. 1–11.

[4] Littman M.L. Markov decision processes. International Encyclopedia of the Social and Behavioral Sciences, 2012, pp. 573–575. http://doi.org/10.1016/b0-08-043076-7/00614-8

[5] Kuz’min V. Using neural networks in the Q-learning algorithm. Transport and Telecommunication, 2003, vol. 4, no. 1, pp. 74–86. (In Russ.).

[6] Melo F.S. Convergence of Q-learning: a simple proof. Institute for Systems and Robotics. URL: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.413.2350&rep=rep1&type=pdf (accessed October 15, 2023).

[7] Herrmann M. RL 5: On-policy and off-policy algorithms. University of Edinburgh, School of Informatics. URL: https://www.inf.ed.ac.uk/teaching/courses/rl/slides15/rl05.pdf (accessed October 15, 2023).

[8] Hasselt H. van, Guez A., Silver D. Deep Reinforcement Learning with Double Q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2015, vol. 30 (1). http://doi.org/10.1609/aaai.v30i1.10295