PiCoEDL: Discovery and Learning of Minecraft Navigation Goals from Pixels and Coordinates

CVPR 2021 Embodied AI Workshop

Juan Jose Nieto

Roger Creus

Xavier Giro-Nieto

Fork me on GitHub

logos

Abstract

Defining a reward function in Reinforcement Learning (RL) is not always possible or very costly. For this reason, there is a great interest in training agents in a task-agnostic manner making use of intrinsic motivations and unsupervised techniques. Due to the complexity to learn useful behaviours in pixel-based domains, the results obtained in RL are still far from the remarkable results obtained in domains such as computer vision and natural language processing. We hypothesize that RL agents will also benefit from unsupervised pre-trainings with no extrinsic rewards, analogously to how humans mostly learn, especially in the early stages of life. Our main contribution is the deployment of the Explore, Discover and Learn (EDL) paradigm for unsupervised learning to the pixel space. In particular, our work focuses on the MineRL environment, where the observation of the agent is represented by: (a) its spatial coordinates in the Minecraft virtual world, and (b) an image from an egocentric viewpoint.

If you find this work useful, please consider citing:

Juan Jose Nieto, Roger Creus, and Xavier Giro-i-Nieto. “Discovery and Learning of Minecraft Navigation Goals\from Pixels and Coordinates”, 2021.

Find our extended abstract in this PDF.

Talk

Here you can watch the master’s dissertation by JuanJo Nieto. In this presentation we give more context to the problem and we also show results in other scenarios, not only in realistic maps.

We invite you to watch the talk by Victor Campos that inspires our work.

Presentation

Results

This WandB report provides more detailed results than those included in the extended abstract.

Discover: Skills

Map #1

Map #2

Map #3

Map #4

Learn: Reward distribution per skill (Map #4)

Pixels - Experts

Pixels - Random

Coord - Random

(PiCoEDL) Pixels & Coord - Random

Learn: Rainbow agent

The best performing skills are 1, 2, 3, 4, 6, 7.

Sampled trajectories per skill

Average reward of evaluation episodes

Demo

Qualitative results of embodied AI agents on Habitat and MineRL.

(coming soon)

PixelEDL

Our concurrent work PixelEDL have a concurrent work in which we consider pixel-only skills discovered by random policies in Habitat, in addition to MineRL.

code

pytorch

This project has been developed using Pytorch.

Source code available here.

acknowledgements

We would like to thank Victor Campos for his enriching discussions and guidance in this work:

Víctor Campos

We want to thank our wonderful technical support staff:

Josep Pujal


We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan Z and Titan X used in this work.
This work has been developed in the framework of project TEC2016-75976-R, financed by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF).

Design by Tim O’Brien t413.com — SinglePaged theme — this site is open source