Learn More
In the semantic multinomial framework patches and images are modeled as points in a semantic probability simplex. Patch theme models are learned resorting to weak supervision via image labels, which leads the problem of scene categories co-occurring in this semantic space. Fortunately, each category has its own co-occurrence patterns that are consistent(More)
Food-related photos have become increasingly popular, due to social networks, food recommendation and dietary assessment systems. Reliable annotation is essential in those systems, but unconstrained automatic food recognition is still not accurate enough. Most works focus on exploiting only the visual content while ignoring the context. To address this(More)
With the fast explosive rate of the amount of image data on the Internet, how to efficiently utilize them in the cross-media scenario becomes an urgent problem. Images are usually accompanied with contextual textual information. These two heterogeneous modalities are mutually reinforcing to make the Internet content more informative. In most cases, visual(More)
In this paper, we describe the details of our methods for the participation in the subtask of the ImageCLEF 2016 Scalable Image Annotation task: Natural Language Caption Generation. The model we used is the combination of a procedure of encoding and a procedure of decoding, which includes a Convolutional neural network(CNN) and a Long Short-Term(More)
Food-related photos have become increasingly very popular, due to social networks, food recommendation and dietary assessment systems. Reliable annotation is essential in those systems, but user-contributed tags are often non-informative and inconsistent, and unconstrained automatic food recognition still has relatively low accuracy. Most works focus on(More)
Multiple image features and multiple semantic concepts from the images have intrinsic and complex relations. These relations influence the effectiveness of image semantic analysis methods, especially on the large scale problems. In this paper, a framework of generating polysemious image representation through three levels of feature aggregation is proposed.(More)
Extracting good representations from images is essential for many computer vision tasks. While progress in deep learning shows the importance of learning hierarchical features, it is also important to learn features through multiple paths. This paper presents Multipath Convolutional-Recursive Neural Networks(M-CRNNs), a novel scheme which aims to learn(More)