• Publications
  • Influence
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
TLDR
This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
TLDR
A simple baseline that utilizes probabilities from softmax distributions is presented, showing the effectiveness of this baseline across all computer vision, natural language processing, and automatic speech recognition, and it is shown the baseline can sometimes be surpassed.
Gaussian Error Linear Units (GELUs)
TLDR
An empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations is performed and performance improvements are found across all considered computer vision, natural language processing, and speech tasks.
Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
TLDR
A tagset is developed, data is annotated, features are developed, and results nearing 90% accuracy are reported on the problem of part-of-speech tagging for English data from the popular micro-blogging service Twitter.
Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters
TLDR
This work systematically evaluates the use of large-scale unsupervised word clustering and new lexical features to improve tagging accuracy on Twitter and achieves state-of-the-art tagging results on both Twitter and IRC POS tagging tasks.
Towards Universal Paraphrastic Sentence Embeddings
TLDR
This work considers the problem of learning general-purpose, paraphrastic sentence embeddings based on supervision from the Paraphrase Database, and compares six compositional architectures, finding that the most complex architectures, such as long short-term memory (LSTM) recurrent neural networks, perform best on the in-domain data.
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
TLDR
A combination of automated and human evaluations show that SCPNs generate paraphrases that follow their target specifications without decreasing paraphrase quality when compared to baseline (uncontrolled) paraphrase systems.
Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units
TLDR
An empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and finding performance improvements across all tasks suggests a new probabilistic understanding of nonlinearities.
From Paraphrase Database to Compositional Paraphrase Model and Back
TLDR
This work proposes models to leverage the phrase pairs from the Paraphrase Database to build parametric paraphrase models that score paraphrase pairs more accurately than the PPDB’s internal scores while simultaneously improving its coverage.
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise
TLDR
It is demonstrated that robustness to label noise up to severe strengths can be achieved by using a set of trusted data with clean labels, and a loss correction that utilizes trusted examples in a data-efficient manner to mitigate the effects of label noise on deep neural network classifiers is proposed.
...
...