Embedding Label Structures for Fine-Grained Feature Representation
The growing explosion in the use of surveillance cameras in public security highlights the importance of vehicle search from large-scale image databases. Precise vehicle search, aiming at finding out all instances for a given query vehicle image, is a challenging task as different vehicles will look very similar to each other if they share same visual attributes. To address this problem, we propose the Repression Network (RepNet), a novel multi-task learning framework, to learn discriminative features for each vehicle image from both coarse-grained and fine-grained level simultaneously. Besides, benefited from the satisfactory accuracy of attribute classification, a bucket search method is proposed to reduce the retrieval time while still maintaining competitive performance. We conduct extensive experiments on the revised VehcileID  dataset. Experimental results show that our RepNet achieves the state-of-the-art performance and the bucket search method can reduce the retrieval time by about 24 times.