Multi-agent learning methods with reinforcement using game theory algorithms

Authors: Bolshakov V.E.
Published in issue: #11(52)/2020
DOI: 10.18698/2541-8009-2020-11-652
Category: Informatics, Computer Engineering and Control \| Chapter: System Analysis, Control, and Information Processing, Statistics
Keywords: deep learning, game theory, multi-agent reinforcement learning, Nash equilibrium, neural networks, stochastic games, StarCraft II, equilibrium search, matrix games
Published: 26.11.2020

The paper considers the methods of multi-agent learning with reinforcement for stochastic games with total sum. It is proposed to use Q-learning and its various modifications, including deep Q-learning, as a reinforcement learning algorithm. The game-theoretic component consists of algorithms based on concepts such as joint actions of agents, Nash equilibrium, and matrix games. Authors describe a successful attempt to combine reinforcement learning and game theory for a multi-agent strategic interaction environment in StarCraft II. An algorithm for deep reinforcement learning with Nash equilibrium search, or Deep Nash Q-Network (Nash-DQN), is proposed and implemented.

References

[1] Hausknecht M., Stone P. Deep recurrent Q-learning for partially observable MDPs. AAAI Fall Symp. Sequential Decision Making for Intelligent Agents, 2015. URL: https://arxiv.org/pdf/1507.06527.pdf (accessed: 15.06.2020).

[2] Nash J. Non-cooperative games. Ann. Math., 1951, vol. 54, no. 2, pp. 286–295. DOI: https://doi.org/10.2307/1969529

[3] Abernethy J., Lai K.A., Wibisono A. Fictitious play: convergence, smoothness, and optimism. arxiv.org: website. URL: https://arxiv.org/abs/1911.08418v1 (accessed: 15.06.2020).

[4] Wellman M.P., Hu J. Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res., 2003, vol. 4, no. 4, pp. 1039–1069.

[5] Lemke C.E., Howson J.T.Jr. Equilibrium points of bimatrix games. J. Soc. Ind. Appl. Math., 1964, vol. 12, no. 2, pp. 413–423. DOI: https://doi.org/10.1137/0112033

[6] Foerster J., Nardelli N., Farquhar G., et al. Stabilising experience replay for deep multi-agent reinforcement learning. Proc. 34th Int. Conf. Machine Learning, 2017, pp. 1146–1155.

[7] Krizhevsky A., Sutskever I., Hinton G.E. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.

[8] Alfimtsev A.N. Deklarativno-protsessnaya tekhnologiya razrabotki intellektual’nykh mul’timodal’nykh interfeysov. Avtoref. diss. dok. tekh. nauk [Declarative-processive development technology for intelligent multimode interfaces. Abs. doc. tech. sci. diss.]. Moscow, ICS RAS Publ., 2016 (in Russ.).

[9] Dai D., Tan W., Zhan H. Understanding the feedforward artificial neural network model from the perspective of network flow. arxiv.org: website. URL: https://arxiv.org/abs/1704.08068 (accessed: 15.06.2020).

[10] Samvelyan M., Rashid T., de Witt C.S., et al. The starcraft multi-agent challenge. accepted at the workshop on deep reinforcement learning. arxiv.org: website. URL: https://arxiv.org/abs/1902.04043 (accessed: 15.06.2020).