SenseSpotting: Never let your parallel data tie you to an old domain

Abstract

Words often gain new senses in new domains. Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics. We define a task, SENSESPOTTING, in which we build systems to spot tokens that have new senses in new domain text. Instead of difficult and expensive annotation, we build a goldstandard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation. Our system is able to achieve F-measures of as much as 80%, when applied to word types it has never seen before. Our approach is based on a large set of novel features that capture varied aspects of how words change when used in new domains.

Extracted Key Phrases

8 Figures and Tables

Cite this paper

@inproceedings{Carpuat2013SenseSpottingNL, title={SenseSpotting: Never let your parallel data tie you to an old domain}, author={Marine Carpuat and Hal Daum{\'e} and Katharine Henry and Ann Irvine and Jagadeesh Jagarlamudi and Rachel Rudinger}, booktitle={ACL}, year={2013} }