Corpus ID: 10031131

Like Finding a Needle in a Haystack: Annotating the American National Corpus for Idiomatic Expressions

@inproceedings{Street2010LikeFA,
  title={Like Finding a Needle in a Haystack: Annotating the American National Corpus for Idiomatic Expressions},
  author={Laura-Gray Street and Nathan Michalov and Rachel Silverstein and Michael Reynolds and Lurdes Ruela and Felicia Flowers and Angela Talucci and Priscilla Pereira and Gabriella Morgon and Samantha Siegel and Marci Barousse and Antequa Anderson and Tashom Carroll and Anna Feldman},
  booktitle={LREC},
  year={2010}
}
Our paper presents the details of a pilot study in which we tagged portions of the American National Corpus (ANC) for idioms composed of verb-noun constructions, prepositional phrases, and subordinate clauses. The three data sets we analyzed included 1,500-sentence samples from the spoken, the nonfiction, and the fiction portions of the ANC. Our paper provides the details of the tagset we developed, the motivation behind our choices, and the inter-annotator agreement measures we deemed… Expand

References

SHOWING 1-10 OF 19 REFERENCES
Unsupervised Type and Token Identification of Idiomatic Expressions
A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language
Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations
An American national corpus: a proposal
The access and processing of idiomatic expressions
Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions
...
1
2
...