by P. Alatur, G. Ramponi, N. He, A. Krause

Abstract:

Multi-agent reinforcement learning addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective. In many real-world scenarios, agents not only aim to maximize their goals but also need to ensure safe behavior. For example, in traffic routing, each vehicle (acting as an agent) seeks to reach its destination swiftly (an objective) while avoiding collisions (a safety constraint). Constrained Markov Games (CMGs) offer a natural framework for addressing safe MARL problems, but they are typically computationally challenging. In this work, we introduce and study Constrained Markov Potential Games (CMPGs), a significant subclass of CMGs. Initially, we demonstrate that Nash policies for CMPGs can be computed through constrained optimization. Then, we showed that Lagrangian-primal dual methods (one tempting approach to solve this optimization problem) cannot be used in this setting. In fact, unlike in single-agent scenarios, CMPGs do not satisfy strong duality, rendering such approaches inapplicable and potentially unsafe. To tackle the CMPG problem, we propose a novel algorithm Coordinate-Ascent for CMPGs with Exploration (CA-CMPG-E), which provably converges to a Nash policy in tabular, finite-horizon CMPGs. The idea behind the algorithm is to solve for each agent a Constrained Markov Decision Process and update the joint policy in the direction of the steepest improvement. Furthermore, we provide the first sample complexity bounds for learning Nash policies in unknown CMPGs guaranteeing safe exploration.

Reference:

Provably Learning Nash Policies in Constrained Markov Potential Games P. Alatur, G. Ramponi, N. He, A. KrauseIn Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 2024

Bibtex Entry:

@inproceedings{
	alatur2024cmpgs,
	author = {Alatur, Pragnya and Ramponi, Giorgia and He, Niao and Krause, Andreas},
	title = {Provably Learning Nash Policies in Constrained Markov Potential Games},
	year = {2024},
	isbn = {9798400704864},
	publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
	address = {Richland, SC},
	booktitle = {Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems},
	pages = {31–39},
	numpages = {9},
	location = {, Auckland, New Zealand, },
	series = {AAMAS '24},
	pdf = {https://dl.acm.org/doi/pdf/10.5555/3635637.3662849}
}