Language & Vision |

Instructor


Xavier Giro-i-Nieto (XG)

Slides

Slides

Video Lecture

(to be added)

Karpathy, Andrej, and Li Fei-Fei. “Deep visual-semantic alignments for generating image descriptions.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128-3137. 2015.
Donahue, Jeffrey, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. “Long-term recurrent convolutional networks for visual recognition and description.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625-2634. 2015.
Xu, Kelvin, Jimmy Ba, Ryan Kiros, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. “Show, attend and tell: Neural image caption generation with visual attention.” International Conference for Machine Learning (ICML), 2015.
Yao, Li, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, and Aaron Courville. “Describing videos by exploiting temporal structure.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 4507-4515. 2015. [code]
Kiros, Ryan, Yukun Zhu, Ruslan R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. “Skip-thought vectors.” In Advances in Neural Information Processing Systems, pp. 3276-3284. 2015. [code]
Mansimov, Elman, Emilio Parisotto, Jimmy Lei Ba, and Ruslan Salakhutdinov. “Generating Images from Captions with Attention.” ICLR 2016. [code]
Johnson, Justin, Andrej Karpathy, and Li Fei-Fei. “DenseCap: Fully Convolutional Localization Networks for Dense Captioning.” CVPR 2016. [software]
Elman Mansimov, Emilio Parisotto, Jimmy Ba and Ruslan Salakhutdinov, “Generating Images from Captions with Attention”. ICLR 2016. [code]
Antol, Stanislaw, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. “VQA: Visual question answering.” In Proceedings of the IEEE International Conference on Computer Vision, pp. 2425-2433. 2015.
Sadeghi, Fereshteh, Santosh K. Divvala, and Ali Farhadi. “Viske: Visual knowledge extraction and question answering by visual verification of relation phrases.” CVPR 2015.
Malinowski, Mateusz, Marcus Rohrbach, and Mario Fritz. “Ask your neurons: A neural-based approach to answering questions about images.” ICCV 2015.
Xiong, Caiming, Stephen Merity, and Richard Socher. “Dynamic Memory Networks for Visual and Textual Question Answering.” arXiv preprint arXiv:1603.01417 (2016).
Ma, Lin, Zhengdong Lu, and Hang Li. “Learning to answer questions from image using convolutional neural network.” AAAI (2016).
Zhu, Yuke, Oliver Groth, Michael Bernstein, and Li Fei-Fei. “Visual7W: Grounded Question Answering in Images.” CVPR 2016.
Tapaswi, Makarand, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. “MovieQA: Understanding Stories in Movies through Question-Answering.” CVPR 2016
Kim, Jin-Hwa, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, and Byoung-Tak Zhang. “Multimodal Residual Learning for Visual QA.” arXiv preprint arXiv:1606.01455 (2016).
Mostafazadeh, Nasrin, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiaodong He, and Lucy Vanderwende. “Generating Natural Questions About an Image.” arXiv preprint arXiv:1603.06059 (2016).
Ben Bolte, Deep Language Modeling for Question Answering using Keras. April 2016.
Tai, Kai Sheng, Richard Socher, and Christopher D. Manning. “Improved semantic representations from tree-structured long short-term memory networks.” ACL 2015. [code]

Instructor

Slides

Video Lecture

Related Work & Resources