• Publications
  • Influence
Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
TLDR
A tagset is developed, data is annotated, features are developed, and results nearing 90% accuracy are reported on the problem of part-of-speech tagging for English data from the popular micro-blogging service Twitter.
Explainable Prediction of Medical Codes from Clinical Text
TLDR
An attentional convolutional network that predicts medical codes from clinical text using a convolutionAL neural network and uses an attention mechanism to select the most relevant segments for each of the thousands of possible codes is presented.
A Latent Variable Model for Geographic Lexical Variation
TLDR
A multi-level generative model that reasons jointly about latent topics and geographical regions is presented, which recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency.
Sparse Additive Generative Models of Text
TLDR
This approach has two key advantages: it can enforce sparsity to prevent overfitting, and it can combine generative facets through simple addition in log space, avoiding the need for latent switching variables.
Gender identity and lexical variation in social media
TLDR
Pairing computational methods and social theory offers a new perspective on how gender emerges as individuals position themselves relative to audiences, topics, and mainstream gender norms.
Representation Learning for Text-level Discourse Parsing
TLDR
A representation learning approach, in which surface features are transformed into a latent space that facilitates RST discourse parsing, which obtains substantial improvements over the previous state-of-the-art in predicting relations and nuclearity on the RST Treebank.
Bayesian Unsupervised Topic Segmentation
TLDR
A novel Bayesian approach to unsupervised topic segmentation is described, showing that lexical cohesion can be placed in a Bayesian context by modeling the words in each topic segment as draws from a multinomial language model associated with the segment; maximizing the observation likelihood in such a model yields a lexically-cohesive segmentation.
Sparse, Dense, and Attentional Representations for Text Retrieval
TLDR
A simple neural model is proposed that combines the efficiency of dual encoders with some of the expressiveness of more costly attentional architectures, and is explored to explore sparse-dense hybrids to capitalize on the precision of sparse retrieval.
Mimicking Word Embeddings using Subword RNNs
TLDR
MIMICK is presented, an approach to generating OOV word embeddings compositionally, by learning a function from spellings to distributionalembeddings by performing learning at the type level of the original word embedding corpus.
...
1
2
3
4
5
...