Learn More
Automatically describing video content with natural language is a fundamental challenge of computer vision. Re-current Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the(More)
While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the variability and complexity of the videos and their associated language that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on(More)
Hashing techniques have been intensively investigated for large scale vision applications. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, most existing supervised hashing methods only construct similarity-preserving hash codes. Observing that semantic structures carry complementary information, we(More)
One of the fundamental problems in image search is to learn the ranking functions, i.e., similarity between the query and image. The research on this topic has evolved through two paradigms: feature-based vector model and image ranker learning. The former relies on the image surrounding texts, while the latter learns a ranker based on human labeled(More)
Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing. In this paper, we present Long Short-Term Memory with Attributes (LSTM-A) a novel architecture that integrates attributes into the successful Convolutional Neural Networks (CNNs) plus Recurrent Neural(More)
One of the fundamental problems in image search is to rank image documents according to a given textual query. We address two limitations of the existing image search engines in this paper. First, there is no straightforward way of comparing textual keywords with visual image content. Image search engines therefore highly depend on the surrounding texts,(More)
Deep convolutional neural networks (CNNs) have proven highly effective for visual recognition, where learning a universal representation from activations of convolutional layer plays a fundamental problem. In this paper, we present Fisher Vector encoding with Variational AutoEncoder (FV-VAE), a novel deep architecture that quantizes the local activations of(More)
Peptides derived from pre-proglucagon (GCG peptides) act in both the periphery and the CNS to change food intake, glucose homeostasis, and metabolic rate while playing a role in anxiety behaviors and physiological responses to stress. Although the actions of GCG peptides produced in the gut and pancreas are well described, the role of glutamatergic GGC(More)
Recognizing actions in videos is a challenging task as video is an information-intensive media with complex variations. Most existing methods have treated video as a flat data sequence while ignoring the intrinsic hierarchical structure of the video content. In particular, an action may span different granularities in this hierarchy including, from small to(More)