Simple vs complex temporal recurrences for video saliency prediction

SalEMA is a video saliency prediction network. It utilizes a moving average of convolutional states to produce state of the art results according to this benchmark on DHF1K, Hollywwod-2 and UCF Sports (July 2019). The model has been trained on the DHF1K dataset.


This paper investigates modifying an existing neural network architecture for static saliency prediction using two types of recurrences that integrate information from the temporal domain. The first modification is the addition of a ConvLSTM within the architecture, while the second is a conceptually simple exponential moving average of an internal convolutional state. We use weights pre-trained on the SALICON dataset and fine-tune our model on DHF1K. Our results show that both modifications achieve state-of-the-art results and produce similar saliency maps.


Find the extended pre-print version of our work on arXiv.

Please cite with the following Bibtex code:

author = {Linardos, Panagiotis and Mohedano, Eva and Nieto, Juan Jose and McGuinness, Kevin and Giro-i-Nieto, Xavier and O'Connor, Noel E.},
title = {Simple vs complex temporal recurrences for video saliency prediction},
booktitle = {British Machine Vision Conference (BMVC)},
month = {September},
year = {2019}

You may also want to refer to our publication with the more human-friendly Chicago style:

Panagiotis Linardos, Eva Mohedano, Juan Jose Nieto, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “Simple vs complex temporal recurrences for video saliency prediction.” BMVC 2019.


Qualitative results: QResults

Click to be redirected to youtube:

Sample video I

Sample video II



Download our best configuration of the SalEMA model here (364MB)


git clone
pip3 install torch torchvision


You may use our pretrained model for inference on either of the 3 datasets: DHF1K [link], Hollywood-2 [link], UCF-sports [link]:

To perform inference on DHF1K validation set:

python -dataset=DHF1K -start=600 -end=700 -dst=/path/to/output -src=/path/to/DHF1K/frames

To perform inference on Hollywood-2 or UCF-sports test set (because of the way the dataset is structured, it’s convenient to use the same path for dst and src):

python -dataset=Hollywood-2 -dst=/path/to/Hollywood-2/testing -src=/path/to/Hollywood-2/testing
python -dataset=UCF-sports -dst=/path/to/UCF-sports/testing -src=/path/to/UCF-sports/testing

To perform inference on your own dataset make sure to follow the same structure as DHF1K (numbered folders followed by numbered frames):

python -dataset=other -dst=/path/to/output -src=/path/to/your_dataset/frames