• Publications
  • Influence
Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
TLDR
We address the problem of part-of-speech tagging for English data from the popular micro-blogging service Twitter. Expand
  • 975
  • 109
  • PDF
A Latent Variable Model for Geographic Lexical Variation
TLDR
The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. Expand
  • 635
  • 70
  • PDF
Sparse Additive Generative Models of Text
TLDR
This paper proposes an alternative to the Dirichletmultinomial for generative models of text: the Sparse Additive Generative model (SAGE). Expand
  • 284
  • 61
  • PDF
Explainable Prediction of Medical Codes from Clinical Text
TLDR
We present an attentional convolutional network that predicts medical codes from clinical text, and use an attention mechanism to select the most relevant segments for each of the thousands of possible codes. Expand
  • 171
  • 46
  • PDF
Gender identity and lexical variation in social media
TLDR
We present a study of the relationship between gender, linguistic style, and social networks, using a novel corpus of 14,000 Twitter users. Expand
  • 260
  • 35
  • PDF
Representation Learning for Text-level Discourse Parsing
Text-level discourse parsing is notoriously difficult, as distinctions between discourse relations require subtle semantic judgments that are not easily captured using standard features. In thisExpand
  • 168
  • 33
  • PDF
Bayesian Unsupervised Topic Segmentation
TLDR
This paper describes a novel Bayesian approach to unsupervised topic segmentation. Expand
  • 228
  • 32
  • PDF
What to do about bad language on the internet
TLDR
The rise of social media has brought computational linguistics in ever-closer contact with bad language: text that defies our expectations about vocabulary, spelling, and syntax. Expand
  • 303
  • 19
  • PDF
Mimicking Word Embeddings using Subword RNNs
TLDR
Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. Expand
  • 99
  • 19
  • PDF
XIML: a common representation for interaction data
TLDR
We introduce XIML (eXtensible Interface Markup Language), a proposed common representation for interaction data, which enables knowledge-based systems to exploit the captured data. Expand
  • 210
  • 18
  • PDF
...
1
2
3
4
5
...