Simple vs complex temporal recurrences for video saliency prediction
SalEMA is a video saliency prediction network. It utilizes a moving average of convolutional states to produce state of the art results according to this benchmark on DHF1K, Hollywwod-2 and UCF Sports (July 2019). The model has been trained on the DHF1K dataset.
Abstract
This paper investigates modifying an existing neural network architecture for static saliency prediction using two types of recurrences that integrate information from the temporal domain. The first modification is the addition of a ConvLSTM within the architecture, while the second is a conceptually simple exponential moving average of an internal convolutional state. We use weights pre-trained on the SALICON dataset and fine-tune our model on DHF1K. Our results show that both modifications achieve state-of-the-art results and produce similar saliency maps.
Publication
Find the extended pre-print version of our work on arXiv.
Please cite with the following Bibtex code:
@InProceedings{Linardos2019,
author = {Linardos, Panagiotis and Mohedano, Eva and Nieto, Juan Jose and McGuinness, Kevin and Giro-i-Nieto, Xavier and O'Connor, Noel E.},
title = {Simple vs complex temporal recurrences for video saliency prediction},
booktitle = {British Machine Vision Conference (BMVC)},
month = {September},
year = {2019}
}
You may also want to refer to our publication with the more human-friendly Chicago style:
Panagiotis Linardos, Eva Mohedano, Juan Jose Nieto, Kevin McGuinness, Xavier Giro-i-Nieto and Noel E. O’Connor. “Simple vs complex temporal recurrences for video saliency prediction.” BMVC 2019.
Results
Qualitative results:
Click to be redirected to youtube:
Model
Download our best configuration of the SalEMA model here (364MB)
Installation
- Clone the repo:
git clone https://github.com/Linardos/SalEMA
- Install requirements
pip install -r requirements.txt
- Install PyTorch 1.0:
pip3 install torch torchvision
Inference
You may use our pretrained model for inference on either of the 3 datasets: DHF1K [link], Hollywood-2 [link], UCF-sports [link]:
To perform inference on DHF1K validation set:
python inference.py -dataset=DHF1K -pt_model=SalEMA30.pt -start=600 -end=700 -dst=/path/to/output -src=/path/to/DHF1K/frames
To perform inference on Hollywood-2 or UCF-sports test set (because of the way the dataset is structured, it’s convenient to use the same path for dst and src):
python inference.py -dataset=Hollywood-2 -pt_model=SalEMA30.pt -dst=/path/to/Hollywood-2/testing -src=/path/to/Hollywood-2/testing
python inference.py -dataset=UCF-sports -pt_model=SalEMA30.pt -dst=/path/to/UCF-sports/testing -src=/path/to/UCF-sports/testing
To perform inference on your own dataset make sure to follow the same structure as DHF1K (numbered folders followed by numbered frames):
python inference.py -dataset=other -pt_model=SalEMA30.pt -dst=/path/to/output -src=/path/to/your_dataset/frames