Abstract
Defining a reward function in Reinforcement Learning (RL) is not always possible or very costly. For this reason, there is a great interest in training agents in a task-agnostic manner making use of intrinsic motivations and unsupervised techniques. Due to the complexity to learn useful behaviours in pixel-based domains, the results obtained in RL are still far from the remarkable results obtained in domains such as computer vision and natural language processing. We hypothesize that RL agents will also benefit from unsupervised pre-trainings with no extrinsic rewards, analogously to how humans mostly learn, especially in the early stages of life. Our main contribution is the deployment of the Explore, Discover and Learn (EDL) paradigm for unsupervised learning to the pixel space. In particular, our work focuses on the MineRL environment, where the observation of the agent is represented by: (a) its spatial coordinates in the Minecraft virtual world, and (b) an image from an egocentric viewpoint.
If you find this work useful, please consider citing:
Juan Jose Nieto, Roger Creus, and Xavier Giro-i-Nieto. “Discovery and Learning of Minecraft Navigation Goals\from Pixels and Coordinates”, 2021.
Find our extended abstract in this PDF.
Talk
Here you can watch the master’s dissertation by JuanJo Nieto. In this presentation we give more context to the problem and we also show results in other scenarios, not only in realistic maps.
We invite you to watch the talk by Victor Campos that inspires our work.
Results
This WandB report provides more detailed results than those included in the extended abstract.
Discover: Skills
Map #1
Map #2
Map #3
Map #4
Learn: Reward distribution per skill (Map #4)
Pixels - Experts
Pixels - Random
Coord - Random
(PiCoEDL) Pixels & Coord - Random
Learn: Rainbow agent
The best performing skills are 1, 2, 3, 4, 6, 7.
Sampled trajectories per skill
Average reward of evaluation episodes
Demo
Qualitative results of embodied AI agents on Habitat and MineRL.
(coming soon)
code
This project has been developed using Pytorch.
Source code available here.
acknowledgements
We would like to thank Victor Campos for his enriching discussions and guidance in this work:
We want to thank our wonderful technical support staff:
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan Z and Titan X used in this work. | |
This work has been developed in the framework of project TEC2016-75976-R, financed by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). |