Eva Mohedano | Kevin McGuinness | Xavier Giro-i-Nieto | Noel O'Connor |
A joint collaboration between:
Insight Centre for Data Analytics | Dublin City University (DCU) | Universitat Politecnica de Catalunya (UPC) | UPC ETSETB TelecomBCN | UPC Image Processing Group |
Abstract
This work explores attention models to weight the contribution of local convolutional representations for the instance search task. We present a retrieval framework based on bags of local convolutional features (BLCF) that benefits from saliency weighting to build an efficient image representation. The use of human visual attention models (saliency) allows significant improvements in retrieval performance without the need to conduct region analysis or spatial verification, and without requiring any feature fine tuning. We investigate the impact of different saliency models, finding that higher performance on saliency benchmarks does not necessarily equate to improved performance when used in instance search tasks. The proposed approach outperforms the state-of-the-art on the challenging INSTRE benchmark by a large margin, and provides similar performance on the Oxford and Paris benchmarks compared to more complex methods that use off-the-shelf representations.
Visuals
5 query examples comparing the average precision between unweighted BLCF, and saliency weighted BLCF (bold). For each query, 9 relevant images are with its precision at retrieved position.
Publication
Find our paper at arXiv.
Slides
Downloads
Datasets
INSTRE, using the evaluation procol from Iscen et al.
Oxford Buildings and Paris Buildings datasets, using the standard evaluation protocol.