PiCoEDL: Discovery and Learning of Minecraft Navigation Goals from Pixels and Coordinates

CVPR 2021 Embodied AI Workshop

Fork me on GitHub

logos

Abstract

Defining a reward function in Reinforcement Learning (RL) is not always possible or very costly. For this reason, there is a great interest in training agents in a task-agnostic manner making use of intrinsic motivations and unsupervised techniques. Due to the complexity to learn useful behaviours in pixel-based domains, the results obtained in RL are still far from the remarkable results obtained in domains such as computer vision and natural language processing. We hypothesize that RL agents will also benefit from unsupervised pre-trainings with no extrinsic rewards, analogously to how humans mostly learn, especially in the early stages of life. Our main contribution is the deployment of the Explore, Discover and Learn (EDL) paradigm for unsupervised learning to the pixel space. In particular, our work focuses on the MineRL environment, where the observation of the agent is represented by: (a) its spatial coordinates in the Minecraft virtual world, and (b) an image from an egocentric viewpoint.

If you find this work useful, please consider citing:

Juan Jose Nieto, Roger Creus, and Xavier Giro-i-Nieto. “Discovery and Learning of Minecraft Navigation Goals\from Pixels and Coordinates”, 2021.

Find our extended abstract in this PDF.

Talk

Here you can watch the master’s dissertation by JuanJo Nieto. In this presentation we give more context to the problem and we also show results in other scenarios, not only in realistic maps.

We invite you to watch the talk by Victor Campos that inspires our work.

Presentation
Results

This WandB report provides more detailed results than those included in the extended abstract.

Discover: Skills

Map #1


Map #2


Map #3


Map #4


Learn: Reward distribution per skill (Map #4)

Pixels - Experts


Pixels - Random


Coord - Random


(PiCoEDL) Pixels & Coord - Random


Learn: Rainbow agent

The best performing skills are 1, 2, 3, 4, 6, 7.

Sampled trajectories per skill


Average reward of evaluation episodes


Demo

Qualitative results of embodied AI agents on Habitat and MineRL.

(coming soon)

code

pytorch

This project has been developed using Pytorch.

Source code available here.

acknowledgements

We would like to thank Victor Campos for his enriching discussions and guidance in this work:




We want to thank our wonderful technical support staff:

   
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan Z and Titan X used in this work. logo-nvidia
This work has been developed in the framework of project TEC2016-75976-R, financed by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). logo-spain