introduction
Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models.
If you find this work useful, please consider citing:
Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, In International Conference on Learning Representations, 2018.
@inproceedings{campos2018skip, title={Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks}, author={Campos, V{\'\i}ctor and Jou, Brendan and Giro-i-Nieto, Xavier and Torres, Jordi and Chang, Shih-Fu}, booktitle={International Conference on Learning Representations}, year={2018} }
Find our paper on arXiv or download the PDF directly from here.
Model
Results
We evaluate the Skip RNN model in a series of tasks: (1) adding task, (2) frequency discrimination task, (3) digit classification, (4) sentiment analysis, and (5) action recognition. Please see the paper for results and discussion.
Examples
When classifying MNIST digits, the model learns which pixels to attend (red) and which pixels to ignore (blue):
Talks
This talk was recorded in the Deep Learning Barcelona Symposium 2018.
The video of this talk was kindly recoded by the Computer Vision Center (CVC) from the Government of Catalonia and the Universitat Autònoma de Barcelona (UAB).
code
This project was developed with Python 3.6.0 and TensorFlow 1.0.0. To download and install TensorFlow, please follow the official guide.
acknowledgements
We would like to especially thank the technical support team at the Barcelona Supercomputing Center.
This work has been supported by the grant SEV2015-0493 of the Severo Ochoa Program awarded by Spanish Government, project TIN2015-65316 by the Spanish Ministry of Science and Innovation contracts 2014-SGR-1051 by Generalitat de Catalunya | |
We gratefully acknowledge the support of NVIDIA Corporation through the BSC/UPC NVIDIA GPU Center of Excellence. | |
The Image ProcessingGroup at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. | |
This work has been developed in the framework of the project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). |