Share This Author
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.
Natural Questions: A Benchmark for Question Answering Research
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.
Observed versus latent features for knowledge base and text inference
- Kristina Toutanova, Danqi Chen
- Computer ScienceProceedings of the 3rd Workshop on Continuous…
- 30 July 2015
It is shown that the observed features model is most effective at capturing the information present for entity pairs with textual relations, and a combination of the two combines the strengths of both model types.
Latent Retrieval for Weakly Supervised Open Domain Question Answering
It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match.
Representing Text for Joint Embedding of Text and Knowledge Bases
- Kristina Toutanova, Danqi Chen, P. Pantel, Hoifung Poon, Pallavi Choudhury, Michael Gamon
- Computer ScienceEMNLP
A model is proposed that captures the compositional structure of textual relations, and jointly optimizes entity, knowledge base, and textual relation representations, and significantly improves performance over a model that does not share parameters among textual relations with common sub-structure.
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
- Christopher Clark, Kenton Lee, Ming-Wei Chang, T. Kwiatkowski, Michael Collins, Kristina Toutanova
- Computer ScienceNAACL
- 1 May 2019
It is found that transferring from entailment data is more effective than transferring from paraphrase or extractive QA data, and that it, surprisingly, continues to be very beneficial even when starting from massive pre-trained language models such as BERT.
Sparse, Dense, and Attentional Representations for Text Retrieval
- Y. Luan, Jacob Eisenstein, Kristina Toutanova, M. Collins
- Computer ScienceTransactions of the Association for Computational…
- 1 May 2020
A simple neural model is proposed that combines the efficiency of dual encoders with some of the expressiveness of more costly attentional architectures, and is explored to explore sparse-dense hybrids to capitalize on the precision of sparse retrieval.
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
It is shown that pre-training remains important in the context of smaller architectures, and fine-tuning pre-trained compact models can be competitive to more elaborate methods proposed in concurrent work.
Cross-Sentence N-ary Relation Extraction with Graph LSTMs
- Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, Wen-tau Yih
- Computer ScienceTACL
- 5 April 2017
A general relation extraction framework based on graph long short-term memory networks (graph LSTMs) that can be easily extended to cross-sentence n-ary relation extraction is explored, demonstrating its effectiveness with both conventional supervised learning and distant supervision.