Abstract
In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with ϵ-greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner’s dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents’ action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.
| Original language | English |
|---|---|
| Article number | 1309 |
| Journal | Scientific reports |
| Volume | 13 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - Dec 2023 |
Fingerprint
Dive into the research topics of 'Intrinsic fluctuations of reinforcement learning promote cooperation'. Together they form a unique fingerprint.Research output
- 14 Citations
- 1 Working paper
-
Intrinsic fluctuations of reinforcement learning promote cooperation
Meylahn, J. & Barfuss, W., 2022, ArXiv.org, p. 1-9, 9 p.Research output: Working paper
File110 Downloads (Pure)
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver