Português English

Tese de Ricardo Grunitzki

Detalhes do Evento

Aluno: Ricardo Grunitzki
Orientadora: Profª. Drª. Ana Lucia Cetertich Bazzan

Título: A Flexible Approach for Optimal Rewards in Multi-Agent Reinforcement Learning Problems
Linha de Pesquisa: Aprendizado de Máquina, Representação de Conhecimento e Raciocínio

Data: 14/09/2018
Horário: 09h30min.
Local: Prédio 43412 – Sala 218 (sala de videoconferência), Instituto de Informática

Banca Examinadora:
Prof. Dr. Felipe Rech Meneguzzi (PUCRS)
Prof. Dr. André Grahl Pereira (UFRGS)
Prof. Dr. Reinaldo Augusto da Costa Bianchi (FEI – por videoconferência)

Presidente da Banca: Profª. Drª. Ana Lucia Cetertich Bazzan

Abstract: Defining a reward function that, when optimized, results in rapid acquisition of an optimal policy, is one of the most challenging tasks involved in applying reinforcement learning algorithms. The behavior learned by agents is directly related to the reward function they are using. Existing work on the Optimal Reward Problem (ORP) propose mechanisms to design reward functions. However, their application is limited to specific sub-classes of single or multi-agent reinforcement learning problems. Moreover, these methods identify which rewards should be given in which situation, but not which aspects of the state or environment should be used when defining the reward function.

This thesis proposes an extended version of the optimal reward problem (EORP) that:

 -can identify both features and reward signals that should compose the reward function;

-is general enough to deal with single and multi-agent reinforcement learning problems;

-is scalable to problems with large number of agents learning simultaneously;

-incorporates a \textit{learning effort} metric in the evaluation of reward functions, allowing the discovery of reward functions that result in faster learning.

The method is evaluated on gridworld and traffic assignment problems to demonstrate its efficacy in designing effective reward functions. The results obtained by the proposed approach are compared to reward functions designed by a domain specialist and to a well-known new design technique for multi-agent rewards called difference rewards. Results show that EORP can identify reward functions that outperform these two types of reward functions in the evaluated problems.

Keywords: Optimal reward problem. Reward function design. Multi-agent reinforcement learning.