• Publications
  • Influence
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods
An unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require time-consuming hand annotations. Expand
Classifying latent user attributes in twitter
A novel investigation of stacked-SVM-based classification algorithms over a rich set of original features, applied to classifying these four user attributes, as distinct from the other primarily spoken genres previously studied in the user-property classification literature. Expand
Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora
A program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories, enabling training on unrestricted monolingual text without human intervention. Expand
Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora
Noise-robust tagger, bracketer and lemmatizer training procedures capable of accurate system bootstrapping from noisy and incomplete initial projections are presented, which significantly exceeds that obtained by direct annotation projection. Expand
One Sense Per Discourse
An experiment confirmed the hypothesis that if a polysemous word such as sentence appears two or more times in a well-written discourse, it is extremely likely that they will all share the same sense and found that the tendency to share sense in the same discourse is extremely strong. Expand
One Sense per Collocation
This paper shows that for certain definitions of collocation, a polysemous word exhibits essentially only one sense per collocation and utilizes this property in a disambiguation algorithm that achieves precision of 92% using combined models of very local context. Expand
Unsupervised Personal Name Disambiguation
This paper presents a set of algorithms for distinguishing personal names with multiple real referents in text, based on little or no supervision. The approach utilizes an unsupervised clusteringExpand
Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French
This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating anExpand
The SIGMORPHON 2016 Shared Task—Morphological Reinflection
The 2016 SIGMORPHON Shared Task was devoted to the problem of morphological reinflection and introduced morphological datasets for 10 languages with diverse typological characteristics, showing a strong state of the art. Expand
A method for disambiguating word senses in a large corpus
The proposed method was designed to disambiguate senses that are usually associated with different topics using a Bayesian argument that has been applied successfully in related tasks such as author identification and information retrieval. Expand