Limiting Dynamics for Q-Learning with Memory One in Symmetric Two-Player, Two-Action Games

J.M. Meylahn*, L. Janssen

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

8 Citations (Scopus)
50 Downloads (Pure)

Abstract

We develop a method based on computer algebra systems to represent the mutual pure strategy best-response dynamics of symmetric two-player, two-action repeated games played by players with a one-period memory. We apply this method to the iterated prisoner’s dilemma, stag hunt, and hawk-dove games and identify all possible equilibrium strategy pairs and the conditions for their existence. The only equilibrium strategy pair that is possible in all three games is the win-stay, lose-shift strategy. Lastly, we show that the mutual best-response dynamics are realized by a sample batch Q-learning algorithm in the infinite batch size limit.
Original languageEnglish
Article number4830491
JournalComplexity
Volume2022
DOIs
Publication statusPublished - 8 Nov 2022

Fingerprint

Dive into the research topics of 'Limiting Dynamics for Q-Learning with Memory One in Symmetric Two-Player, Two-Action Games'. Together they form a unique fingerprint.

Cite this