Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing

@article{Izadinia2015SegmentPhraseTF,
  title={Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing},
  author={Hamid Izadinia and Fereshteh Sadeghi and S. Divvala and Hannaneh Hajishirzi and Yejin Choi and Ali Farhadi},
  journal={2015 IEEE International Conference on Computer Vision (ICCV)},
  year={2015},
  pages={10-18}
}
We introduce Segment-Phrase Table (SPT), a large collection of bijective associations between textual phrases and their corresponding segmentations. Leveraging recent progress in object recognition and natural language semantics, we show how we can successfully build a high-quality segment-phrase table using minimal human supervision. More importantly, we demonstrate the unique value unleashed by this rich bimodal resource, for both vision as well as natural language understanding. First, we… Expand
Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes
TLDR
This work presents an approach to simultaneously perform semantic segmentation and prepositional phrase attachment resolution for captioned images and shows that joint reasoning produces more accurate results than any module operating in isolation. Expand
Image Captions as Paraphrases
Paraphrase resources have been exploited for various problems in the domain of natural language processing. Although there are large paraphrase datasets available, many of them lack textual diversityExpand
Visual Denotations for Recognizing Textual Entailment
TLDR
This work proposes to map phrases to their visual denotations and compare their meaning in terms of their images to show that this approach is effective in the task of Recognizing Textual Entailment when combined with specific linguistic and logic features. Expand
Visual Denotations for Recognizing Textual
In the logic approach to Recognizing Textual Entailment, identifying phrase-tophrase semantic relations is still an unsolved problem. Resources such as the Paraphrase Database offer limited coverageExpand
A classification model for semantic entailment recognition with feature combination
TLDR
A classification model is constructed based on support vector machine to detect high-level semantic entailment relations in Chinese text pair, including entailment and non-entailment for the Binary-Class and forward entailment, reverse entailsment, bidirectional entailments, contradiction and independence for the Multi-Class. Expand
Learning to detect visual relations
TLDR
A weakly-supervised approach is proposed which, given pre-trained object detectors, enables us to learn relation detectors using image-level labels only, maintaining a performance close to fully- supervised models. Expand
Detecting Unseen Visual Relations Using Analogies
TLDR
This work learns a representation of visual relations that combines individual embeddings for subject, object and predicate together with a visual phrase embedding that represents the relation triplet, and demonstrates the benefits of this approach on three challenging datasets. Expand
A Continuously Growing Dataset of Sentential Paraphrases
TLDR
A new method to collect large-scale sentential paraphrases from Twitter by linking tweets through shared URLs is presented, which presents the largest human-labeled paraphrase corpus to date of 51,524 sentence pairs and the first cross-domain benchmarking for automatic paraphrase identification. Expand
Reasoning About Fine-Grained Attribute Phrases Using Reference Games
TLDR
A framework for learning to describe finegrained visual differences between instances using attribute phrases is presented, which shows that embedding an image into the semantic space of attribute phrases derived from listeners offers 20% improvement in accuracy over existing attributebased representations on the FGVC-aircraft dataset. Expand
Estimating the Information Gap between Textual and Visual Representations
TLDR
A novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information and three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. Expand
...
1
2
...

References

SHOWING 1-10 OF 43 REFERENCES
From captions to visual concepts and back
TLDR
This paper uses multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives, and develops a maximum-entropy language model. Expand
A Survey of Paraphrasing and Textual Entailment Methods
TLDR
Key ideas from the two areas of paraphrasing and textual entailment are summarized by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources. Expand
VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases
TLDR
This work introduces the problem of visual verification of relation phrases and developed a Visual Knowledge Extraction system called VisKE, which has been used to not only enrich existing textual knowledge bases by improving their recall, but also augment open-domain question-answer reasoning. Expand
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
TLDR
This work introduces a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data and introduces a structured max-margin objective that allows this model to explicitly associate fragments across modalities. Expand
Deep Visual-Semantic Alignments for Generating Image Descriptions
  • A. Karpathy, Li Fei-Fei
  • Computer Science, Medicine
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2017
TLDR
A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented. Expand
Combining Language and Vision with a Multimodal Skip-gram Model
TLDR
Since they propagate visual information to all words, the MMSKIP-GRAM models discover intriguing visual properties of abstract words, paving the way to realistic implementations of embodied theories of meaning. Expand
Grounded Compositional Semantics for Finding and Describing Images with Sentences
TLDR
The DT-RNN model, which uses dependency trees to embed sentences into a vector space in order to retrieve images that are described by those sentences, outperform other recursive and recurrent neural networks, kernelized CCA and a bag-of-words baseline on the tasks of finding an image that fits a sentence description and vice versa. Expand
Show and tell: A neural image caption generator
TLDR
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. Expand
Distributed Representations of Words and Phrases and their Compositionality
TLDR
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling. Expand
Enriching Visual Knowledge Bases via Object Discovery and Segmentation
TLDR
The approach combines the power of generative modeling for segmentation with the effectiveness of discriminative models for detection to learn and exploit top-down segmentation priors based on visual subcategories. Expand
...
1
2
3
4
5
...