introduction

Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. Our model incorporates recurrence on two different domains: (i) the spatial, which allows to discover the different object instances within a frame, and (ii) the temporal, which allows to keep the coherence of the segmented objects along time. We train RVOS for zero-shot video object segmentation and are the first ones to report quantitative results for DAVIS-2017 and YouTube-VOS benchmarks. Further, we adapt RVOS for one-shot video object segmentation by using the masks obtained in previous time steps as inputs to be processed by the recurrent module. Our model reaches comparable results to state-of-the-art techniques in YouTube-VOS benchmark and outperforms all previous video object segmentation methods not using online learning in the DAVIS-2017 benchmark. Moreover, our model achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU.

If you find this work useful, please consider citing:

Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS: End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019.

@InProceedings{Ventura_2019_CVPR,
author = {Ventura, Carles and Bellver, Miriam and Girbau, Andreu and Salvador, Amaia 
          and Marques, Ferran and Giro-i-Nieto, Xavier},
title = {RVOS: End-to-End Recurrent Network for Video Object Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}

Download our paper in pdf here or find it on arXiv.

Model

Our proposed architecture where RNNs are considered in both spatial and temporal domains. We show an example where each predicted instance mask is displayed with a different color.

model

Results

Results on YouTube-VOS validation set for the semi-supervised task (one-shot):

youtube-vos one shot

Results on DAVIS-2017 test-dev set for the semi-supervised task (one-shot):

davis one shot

Results on YouTube-VOS validation set for the unsupervised task (zero-shot):

youtube-vos zero shot

Examples

Results on YouTube-VOS validation set for the semi-supervised task (one-shot):

youtube-vos one shot

Results on DAVIS-2017 test-dev set for the semi-supervised task (one-shot):

davis one shot

Results on YouTube-VOS validation set for the unsupervised task (zero-shot):

youtube-vos zero shot

Results on DAVIS-2017 test-dev set for the unsupervised task (zero-shot):

youtube-vos zero shot

code

pytorch

We implement our models using Pytorch.

Source code is available here.

acknowledgements

We want to thank our technical support team:

   
We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used in this work. logo-nvidia
The Scene Understanding and Artificial Intelligence (SUnAI) group at Universitat Oberta de Catalunya (UOC) is a SGR17 Preconsolidated Research Group recognized by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. logo-catalonia
The Emerging Technologies for Artificial Intelligence group at Barcelona Supercomputing Center is part of the SGR-2017 1414 sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. logo-catalonia
The Image Processing Group at the UPC is a SGR17 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. logo-catalonia
This research was supported by Industrial Doctorates 2017-DI-064 and 2017-DI-028 from the Government of Catalonia. logo-catalonia
This work has been developed in the framework of projects TIN2015-66951-C2-2-R, TIN2015-65316-P and TEC2016-75976-R, financed by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). logo-spain