Aluno: Gabriel de Oliveira Ramos
Orientadora: Profª. Drª. Ana Lucia Cetertich Bazzan
Título: Regret Minimisation and System-Efficiency in Route Choice
Linha de Pesquisa: Inteligência Artificial
Local: Prédio 43412 – Sala 215 (sala de videoconferência), Instituto de Informática
Prof. Dr. Felipe Rech Meneguzzi (PUCRS)
Profª. Drª. Luciana Salete Buriol (UFRGS)
Profª. Drª. Ann Nowé (VUB, Bélgica – por videoconferência)
Presidente da Banca: Profª. Drª. Ana Lucia Cetertich Bazzan
Abstract: Multiagent reinforcement learning (MARL) is a challenging task, where self-interested agents concurrently learn a policy that maximise their utilities. Learning here is difficult because agents must adapt to each other, which makes their objective a moving target. As a side effect, no convergence guarantees exist for the general MARL setting. This thesis exploits a particular MARL problem, namely route choice (where selfish drivers aim at choosing routes that minimise their travel costs), to deliver convergence guarantees. We are particularly interested in guaranteeing convergence to two fundamental solution concepts: the user equilibrium (UE, when no agent benefits from unilaterally changing its route) and the system optimum (SO, when average travel time is minimum). The main goal of this thesis is to show that, in the context of route choice, MARL can be guaranteed to converge to the UE as well as to the SO upon certain conditions. Firstly, we introduce a regret-minimising Q-learning algorithm, which we prove that converges to the UE. Our algorithm works by estimating the regret associated with agents’ actions and using such information as reinforcement signal for updating the corresponding Q-values. We also establish a bound on the agents’ regret. We then extend this algorithm to deal with non-local information provided by a navigation service. Using such information, agents can improve their regrets estimates, thus performing empirically better. Finally, in order to mitigate the effects of selfishness, we also present a generalised marginal-cost tolling scheme in which drivers are charged proportional to the cost imposed on others. We then devise a toll-based Q-learning algorithm, which we prove that converges to the SO and that is fairer than existing tolling schemes.
Keywords: Multiagent reinforcement learning; Route choice; User equilibrium; System optimal; Regret minimisation; Action regret; Travel information; Marginal-cost tolling.