Temporal Regularization of Saliency Maps in Egocentric Videos

Temporal Regularization of Saliency Maps

in Egocentric Videos

Panagiotis Linardos

Monica Cherto

upc-logo

Universitat Politècnica de Catalunya

Dublin City University

Publication

This work explores how temporal regularization in egocentric videos may have a positive or negative impact in saliency prediction depending on the viewer behavior. Our study is based on the new EgoMon dataset, which consists of seven videos recorded by three subjects in both free-viewing and task-driven set ups. We predict a frame-based saliency prediction over the frames of each video clip, as well as a temporally regularized version based on deep neural networks. Our results indicate that the NSS saliency metric improves during task-driven activities, but that it clearly drops during free-viewing. Encouraged by the good results in task-driven activities, we also computed and publish the saliency maps for the EPIC Kitchens dataset.

Find the full paper on arXiv or download the PDF directly from here.

If you find this work useful, please consider citing:

Panagiotis Linardos, Eva Mohedano, Monica Cherto, Cathal Gurrin, Xavier Giro-i-Nieto. “Temporal Saliency Adaptation in Egocentric Videos”, Extended abstract at the ECCV Workshop on Egocentric Perception, Interaction and Computing (EPIC), 2018.

@inproceedings{Linardos2018videosalgan,
title={Temporal Saliency Adaptation in Egocentric Videos},
author={Panagiotis Linardos, Eva Mohedano, Monica Cherto, Cathal Gurrin, Xavier Giro-i-Nieto},
journal={arXiv preprint arXiv:1808.09559},
year={2018}
}

Model

Our work is based on SalGAN, a computational model of saliency to predict human fixations on still images. In terms of architecture, we have added a convolutional LSTM layer on top of the frame-based saliency predictions.

model

Dataset

The Egomon Gaze & Video dataset can be downloaded as a single file, or by components:

Full dataset (22G)
Gaze Data (xlsx) (2.2G)
Gaze Data (csv) (1.9G)
Narrative (Only for the botanic gardens) (57M)
Clean Videos (5.7G)
Videos with overlaid gaze fixations (5G)

Tobii glasses used for gaze data recording

Narrative clip equipment

Example of a clean video

Example of a video with overlaid gaze fixations

More qualitative examples can be observed in this site.

Results

We evaluated SalGAN with and without our temporal regularization on different datasets:

Performance on DHF1K

Performance on DHF1K and EgoMon

Analytical Results on the 2 types of EgoMon recordings.

When it comes to visual attention there is not always a direct relationship between actions and fixations. For example, a person can easily carry an object in her hand and put it on the table without looking at it. The daily art of cooking, on the other hand, is a series of object-manipulation tasks that require hand-eye coordination. Actions such as cutting onions or pouring a liquid into a bottle are hard to accomplish without using both hands and eyes in coordination. For that reason, we expect that using the salient maps of the video will bring the model closer to the features that are most intimately linked with the tasks carried out by the subjects during the Epic Kitchens dataset acquisition.

Examples of epic-kitchen frames with their saliency maps. 2nd row corresponds to Vanilla SalGAN predictions and 3rd row to the Augmented SalGAN predictions.

You may download saliency maps from here:

Epic-Kitchens (SalGAN) (25G) Epic-Kitchens (+convLSTM) (737M)

EgoMon (SalGAN) (216M) EgoMon (+convLSTM) (93M)

Presentation

EgoMon Gaze and Video Dataset for Visual Saliency Prediction from Universitat Politècnica de Catalunya
</center>

Poster

Download the PDF here

code

This project was developed with Python 3.6.5 and PyTorch 0.4.0. To download and install PyTorch, please follow the official guide.

acknowledgements

We especially want to thank our technical support team:

Albert Gil

Josep Pujal


We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan Z and Titan X used in this work.
The Image Processing Group at the UPC is a SGR17 Consolidated Research Group recognized by the Government of Catalonia (Generalitat de Catalunya) through its AGAUR office.
This work has been developed in the framework of projects TEC2013-43935-R and TEC2016-75976-R, financed by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF).

Design by Tim O’Brien t413.com — SinglePaged theme — this site is open source