Reinforcement Learning Research Articles Review

Decentralized Reinforcement Learning of Robot Behaviors. Artificial Intelligence, Volume 256, March 2018, Pages 130-159. David L. Leottau, Javier Ruiz-del-Solar, Robert Babuška

A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In addition to proposing this methodology, three specific multi agent DRL approaches are considered: DRL-Independent, DRL Cooperative-Adaptive (CA), and DRL-Lenient. These approaches are validated and analyzed with an extensive empirical study using four different problems: 3D Mountain Car, SCARA Real-Time Trajectory Generation, Ball-Dribbling in humanoid soccer robotics, and Ball-Pushing using differential drive robots. The experimental validation provides evidence that DRL implementations show better performances and faster learning times than their centralized counterparts, while using less computational resources. DRL-Lenient and DRL-CA algorithms achieve the best final performances for the four tested problems, outperforming their DRL-Independent counterparts. Furthermore, the benefits of the DRL-Lenient and DRL-CA are more noticeable when the problem complexity increases and the centralized scheme becomes intractable given the available computational resources and training time.


Multi-robot inverse reinforcement learning under occlusion with estimation of state transitions. Artificial Intelligence, Volume 263, October 2018, Pages 46-73. Kenneth Bogert, Prashant Doshi

Inverse reinforcement learning (IRL), analogously to RL, refers to both the problem and associated methods by which an agent passively observing another agent’s actions over time, seeks to learn the latter’s reward function. The learning agent is typically called the learner while the observed agent is often an expert in popular applications such as in learning from demonstrations. Some of the assumptions that underlie current IRL methods are impractical for many robotic applications. Specifically, they assume that the learner has full observability of the expert as it performs its task; that the learner has full knowledge of the expert’s dynamics; and that there is always only one expert agent in the environment. For example, these assumptions are particularly restrictive in our application scenario where a subject robot is tasked with penetrating a perimeter patrol by two other robots after observing them from a vantage point. In our instance of this problem, the learner can observe at most 10% of the patrol. We relax these assumptions and systematically generalize a known IRL method, Maximum Entropy IRL, to enable the subject to learn the preferences of the patrolling robots, subsequently their behaviors, and predict their future positions well enough to plan a route to its goal state without being spotted. Challenged by occlusion, multiple interacting robots, and partially known dynamics we demonstrate empirically that the generalization improves significantly on several baselines in its ability to inversely learn in this application setting. Of note, it leads to significant improvement in the learner’s overall success rate of penetrating the patrols. Our methods represent significant steps towards making IRL pragmatic and applicable to real-world contexts.


More …