• Publications
  • Influence
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
TLDR
In this paper, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Expand
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
TLDR
We present a simple baseline that utilizes probabilities from softmax distributions to detect if an example is misclassified or out-of-distribution. Expand
Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
TLDR
We address the problem of part-of-speech tagging for English data from the popular micro-blogging service Twitter. Expand
Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters
TLDR
We have constructed a state-of-the-art part-ofspeech tagger for the online conversational text genres of Twitter and IRC, and have publicly released our new evaluation data, annotation guidelines, open-source tagger, and word clusters. Expand
Towards Universal Paraphrastic Sentence Embeddings
TLDR
We consider the problem of learning general-purpose, paraphrastic sentence embeddings based on supervision from the Paraphrase Database (Ganitkevitch et al., 2013). Expand
Gaussian Error Linear Units (GELUs)
TLDR
We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function that weights inputs by their value, rather than their sign as in ReLUs ($x\mathbf{1}_{x>0}$). Expand
From Paraphrase Database to Compositional Paraphrase Model and Back
TLDR
The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive semantic resource, consisting of a list of phrase pairs with (heuristic) confidence estimates. Expand
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
TLDR
We propose syntactically controlled paraphrase networks (SCPNs) and use them to generate adversarial examples that improve the robustness of pretrained models to syntactic variation when used to augment their training data. Expand
Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks
TLDR
Modeling sentence similarity is complicated by the ambiguity and variability of linguistic expression. Expand
Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units
TLDR
We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function that combines the intuitions of dropout and zoneout while respecting neuron values. Expand
...
1
2
3
4
5
...