Xinhang Song

Learn More
This paper describes the participation of our team-MIAR ICT in the ImageCLEF 2013 Robot Vision Challenge. The task of the Challenge asked participants to classify imaged indoor scenes and recognize the predefined objects appeared in the imaged scene. Our approach is based on the recently proposed Kernel Descriptors framework, which is an effective(More)
In this paper, we describe the details of our methods for the participation in the subtask of the ImageCLEF 2016 Scalable Image Annotation task: Natural Language Caption Generation. The model we used is the combination of a procedure of encoding and a procedure of decoding, which includes a Convolutional neural network(CNN) and a Long Short-Term(More)
With the fast explosive rate of the amount of image data on the Internet, how to efficiently utilize them in the cross-media scenario becomes an urgent problem. Images are usually accompanied with contextual textual information. These two heterogeneous modalities are mutually reinforcing to make the Internet content more informative. In most cases, visual(More)
Food-related photos have become increasingly popular, due to social networks, food recommendation and dietary assessment systems. Reliable annotation is essential in those systems, but unconstrained automatic food recognition is still not accurate enough. Most works focus on exploiting only the visual content while ignoring the context. To address this(More)
Abstract Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In contrast to the previous image description methods that focus on describing the whole image, this paper presents a method of generating rich image descriptions from image regions.(More)
Extracting good representations from images is essential for many computer vision tasks. While progress in deep learning shows the importance of learning hierarchical features, it is also important to learn features through multiple paths. This paper presents Multipath Convolutional-Recursive Neural Networks(M-CRNNs), a novel scheme which aims to learn(More)
Distance metric learning is widely used in many visual computing methods, especially image classification. Among various metric learning approaches, Fisher Discriminant Analysis (FDA) is a classical metric learning approach utilizing the pair-wise semantic similarity and dissimilarity in image classification. Moreover, Local Fisher Discrimi-nant Analysis(More)
  • 1