BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
- Computer ScienceNorth American Chapter of the Association for…
- 2019
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Natural Questions: A Benchmark for Question Answering Research
- T. Kwiatkowski, Jennimaria Palomaki, Slav Petrov
- Computer ScienceInternational Conference on Topology, Algebra and…
- 1 August 2019
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.
Fast and Robust Neural Network Joint Models for Statistical Machine Translation
- Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, R. Schwartz, J. Makhoul
- Computer ScienceAnnual Meeting of the Association for…
- 1 June 2014
A novel formulation for a neural network joint model (NNJM), which augments the NNLM with a source context window, which is purely lexicalized and can be integrated into any MT decoder.
PaLM: Scaling Language Modeling with Pathways
- Aakanksha Chowdhery, Sharan Narang, Noah Fiedel
- Computer ScienceArXiv
- 5 April 2022
A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.
Visual Storytelling
- Ting-Hao 'Kenneth' Huang, Francis Ferraro, Margaret Mitchell
- Computer ScienceNorth American Chapter of the Association for…
- 13 April 2016
Modelling concrete description as well as figurative and social language, as provided in this dataset and the storytelling task, has the potential to move artificial intelligence from basic understandings of typical visual scenes towards more and more human-like understanding of grounded event structure and subjective expression.
RobustFill: Neural Program Learning under Noisy I/O
- Jacob Devlin, J. Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli
- Computer ScienceInternational Conference on Machine Learning
- 21 March 2017
This work directly compares both approaches for automatic program learning on a large-scale, real-world learning task and demonstrates that the strength of each approach is highly dependent on the evaluation metric and end-user application.
Zero-Shot Entity Linking by Reading Entity Descriptions
- L. Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, Honglak Lee
- Computer ScienceAnnual Meeting of the Association for…
- 18 June 2019
It is shown that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities and proposed domain-adaptive pre-training (DAP) is proposed to address the domain shift problem associated with linking unseen entities in a new domain.
Synthetic QA Corpora Generation with Roundtrip Consistency
- Chris Alberti, D. Andor, Emily Pitler, Jacob Devlin, Michael Collins
- Computer ScienceAnnual Meeting of the Association for…
- 1 June 2019
A novel method of generating synthetic question answering corpora is introduced by combining models of question generation and answer extraction, and by filtering the results to ensure roundtrip consistency, establishing a new state-of-the-art on SQuAD2 and NQ.
Universal Neural Machine Translation for Extremely Low Resource Languages
- Jiatao Gu, Hany Hassan, Jacob Devlin, V. Li
- Computer Science, LinguisticsNorth American Chapter of the Association for…
- 14 February 2018
The proposed approach utilizing a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language is able to achieve 23 BLEU on Romanian-English WMT2016 using a tiny parallel corpus of 6k sentences.
Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis
- Rudy Bunel, Matthew J. Hausknecht, Jacob Devlin, Rishabh Singh, Pushmeet Kohli
- Computer ScienceInternational Conference on Learning…
- 15 February 2018
Reinforcement learning is performed on top of a supervised model with an objective that explicitly maximizes the likelihood of generating semantically correct programs, which leads to improved accuracy of the models, especially in cases where the training data is limited.
...
...