PathGAN: Visual Scanpath Prediction with

Generative Adversarial Networks

Fork me on GitHub

Drawing Drawing

Insight Center for Data Analytics (DCU)

Universitat Politècnica de Catalunya

publication

We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples. A visual scanpath is defined as the sequence of fixation points over an image defined by a human observer with its gaze. PathGAN is composed of two parts, the generator and the discriminator. Both parts extract features from images using off-the-shelf networks, and train recurrent layers to generate or discriminate scanpaths accordingly. In scanpath prediction, the stochastic nature of the data makes it very difficult to generate realistic predictions using supervised learning strategies, but we adopt adversarial training as a suitable alternative. Our experiments prove how PathGAN improves the state of the art of visual scanpath prediction on the iSUN and Salient360! datasets.

Find the full paper on arXiv or download the PDF directly from here.

If you find this work useful, please consider citing:

Marc Assens, Xavier Giro-i-Nieto, Kevin McGuinness, Noel E. O’Connor. “PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks”, ECCV Workshop on Egocentric Perception, Interaction and Computing (EPIC), 2018.

@inproceedings{Assens2018pathgan,
title={PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks},
author={Marc Assens, Xavier Giro-i-Nieto, Kevin McGuinness, Noel E. O'Connor},
journal={ECCV Workshop on Egocentric Perception, Interaction and Computing (EPIC)},
year={2018}
}
Model
The modelis composed by two deep neural networks, the generator and the discriminator, whose combined efforts aim at predicting a realistic scanpath from a given image.

Architecture

The model is trained following the cGAN framework to allow the predictions to be conditioned to an input image, encoded by a pre-trained convolutional neural network. The generator reads images as input and outputs a variable length sequence of predicted fixation points. In addition to the coordinates of the fixation points, our model has an end-of-sequence (EOS) neuron to encode the scanpath variable length behavior. The discriminator predicts if a given scanpath is synthesized or not, and this decision is conditioned to the associated image.
Slides
Poster

PathGAN poster

PathGAN poster presentation by Xavier Giro-i-Nieto

code

pytorch

We implement our models using Keras.

Find source code & pre-trained weights on github.

Examples

We provide examples of predicted object sequences for two datasets.

iSUN

CVPPP results

Salient360!

Pascal results

The big dot indicates the first fixation of the scanpath.

acknowledgements

We especially want to thank our technical support team:

   
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan Z and Titan X used in this work. logo-nvidia
The Image Processing Group at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. logo-catalonia
This work has been developed in the framework of projects TEC2013-43935-R and TEC2016-75976-R, financed by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). logo-spain