• Corpus ID: 6923915

Idioms in Context: The IDIX Corpus

@inproceedings{Sporleder2010IdiomsIC,
  title={Idioms in Context: The IDIX Corpus},
  author={Caroline Sporleder and Linlin Li and Philip John Gorinski and Xaver Koch},
  booktitle={LREC},
  year={2010}
}
Idioms and other figuratively used expressions pose considerable problems to natural language processing applications because they are very frequent and often behave idiosyncratically. [] Key Result We believe that this resource will be useful both for linguistic and computational linguistic studies.

Figures and Tables from this paper

Detecting and Processing Figurative Language in Discourse
TLDR
This talk will present an unsupervised method for token-based idiom detection, which exploits the fact that well-formed texts exhibit lexical cohesion, i.e. words are semantically related to other words in the context.
Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms
TLDR
A fairly large, Potential Idiomatic Expression (PIE) dataset for Natural Language Processing (NLP) in English that contains over 20,100 samples with almost 1,200 cases of idioms (with their meanings) from 10 classes (or senses).
Representations of Idioms for Natural Language Processing: Idiom type and token identification, Language Modelling and Neural Machine Translation
TLDR
It is demonstrated that high-dimensional representations allow idiom classifiers to better model the interactions between global and local features and thereby improve the performance of these systems with regard to processing idioms.
The Other Side of the Coin: Unsupervised Disambiguation of Potentially Idiomatic Expressions by Contrasting Senses
TLDR
This work presents an unsupervised approach for English that makes use of literalisations of idiom senses to improve disambiguation, which is based on the lexical cohesion graph-based method by Sporleder and Li (2009).
A Survey of Idiomatic Preposition-Noun-Verb Triples on Token Level
TLDR
This work investigates whether a text instance of a potentially idiomatic MWE is actually used idiomatically in a given context or not, and shows that EUROPARL is particularly well suited for MWE extraction, as most MWEs in this corpus are indeed used only Idiomatically.
ID10M: Idiom Identification in 10 Languages
TLDR
A novel multilingual Transformer-based system for the identification and understanding of idioms and a high-quality automatically-created training dataset in 10 languages, along with a novel manually-curated evaluation benchmark are proposed.
Idiom Token Classification using Sentential Distributed Semantics
TLDR
This work explores the use of Skip-Thought Vectors to create distributed representations that encode features that are predictive with respect to idiomtoken classification and shows that classifiers using these representations have competitive performance compared with the state of the art in idiom token classification.
A New Approach for Idiom Identification Using Meanings and the Web
TLDR
This paper presents a new, domain independent, general-purpose idiom identification approach that combines the knowledge of the Web with the knowledge extracted from dictionaries and can overcome the limitations of current techniques that rely on linguistic knowledge or statistics.
Idiomaticity Prediction of Chinese Noun Compounds and Its Applications
TLDR
A Relational and Compositional Representation Learning model (RCRL) is proposed, which considers the relational textual patterns and the compositionality levels of Chinese NCs and demonstrates the effectiveness of RCRL, outperforming state-of-the-art approaches.
Extraction of German Multiword Expressions from Parsed Corpora Using Context Features
TLDR
By using both morpho-syntactic and syntactic features, this work achieves a higher precision in the identification of idiomatic MWEs, than by using only properties of one type.
...
...

References

SHOWING 1-10 OF 35 REFERENCES
Unsupervised Type and Token Identification of Idiomatic Expressions
TLDR
This article develops statistical measures that each model a specific property of idiomatic expressions by looking at their actual usage patterns in text, and uses some of the measures in a token identification task where they distinguish idiomatic and literal usages of potentially idiomatic expression in context.
Detecting Japanese idioms with a linguistically rich dictionary
TLDR
A set of linguistic knowledge for idiom detection that is implemented in an idiom dictionary is proposed and more than 90% of the idioms are detected with 90% accuracy.
The VNC-Tokens Dataset
Idiomatic expressions formed from a verb and a noun in its dir ect object position are a productive cross-lingual class of multiword expressions, which can be used both idiomatically and as a li ter
Construction of an Idiom Corpus and its Application to Idiom Identification based on WSD Incorporating Idiom-Specific Features
TLDR
The corpus of an idiom corpus for Japanese is constructed and it is found that a standard supervised WSD method works well for the idiom identification and achieved an accuracy of 89.25% and 88.86% with/without idiom-specific features and that the most effective idom-specific feature is the one involving the adjacency of idiom constituents.
A Survey of Idiomatic Preposition-Noun-Verb Triples on Token Level
TLDR
This work investigates whether a text instance of a potentially idiomatic MWE is actually used idiomatically in a given context or not, and shows that EUROPARL is particularly well suited for MWE extraction, as most MWEs in this corpus are indeed used only Idiomatically.
Pulling their Weight: Exploiting Syntactic Forms for the Automatic Identification of Idiomatic Expressions in Context
TLDR
The use of informative prior knowledge about the overall syntactic behaviour of a potentially-idiomatic expression is explored to determine whether an instance of the expression is used idiomatically or literally (token-based knowledge).
Japanese Idiom Recognition: Drawing a Line between Literal and Idiomatic Meanings
TLDR
This paper proposes a set of lexical knowledge of idioms for idiom recognition and evaluated the knowledge by measuring the performance of an idiom recognizer that exploits the knowledge.
A constructional approach to idioms and word formation
TLDR
This dissertation explores a constructional approach to various aspects of grammar, in particular idioms and derivational morphology, within the Head-Driven Phrase Structure Grammar (HPSG) framework, and shows that idioms frequently occur in non-canonical forms.
A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language
In this paper we present TroFi (Trope Finder), a system for automatically classifying literal and nonliteral usages of verbs through nearly unsupervised word-sense disambiguation and clustering
Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations
TLDR
These statistical, corpus-based measures can be successfully used for distinguishing idiomatic combinations from non-idiomatic ones and a means for automatically determining which syntactic forms a particular idiom can appear in and hence should be included in its lexical representation is proposed.
...
...