Evaluating Models’ Local Decision Boundaries via Contrast Sets
- Matt Gardner, Yoav Artzi, Ben Zhou
- Computer ScienceFindings
- 6 April 2020
A more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data, and recommends that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets.
Evaluating NLP Models via Contrast Sets
- Matt Gardner, Yoav Artzi, Ben Zhou
- Computer ScienceArXiv
- 6 April 2020
A new annotation paradigm for NLP is proposed that helps to close systematic gaps in the test data, and it is recommended that after a dataset is constructed, the dataset authors manually perturb the test instances in small but meaningful ways that change the gold label, creating contrast sets.
Polyglot Contextual Representations Improve Crosslingual Transfer
- Phoebe Mulcaire, Jungo Kasai, Noah A. Smith
- Computer Science, LinguisticsNorth American Chapter of the Association for…
- 26 February 2019
Rosita is introduced, a method to produce multilingual contextual word representations by training a single language model on text from multiple languages, which provides further evidence for the benefits of polyglot learning, in which representations are shared across multiple languages.
Polyglot Semantic Role Labeling
- Phoebe Mulcaire, Swabha Swayamdipta, Noah A. Smith
- Computer Science, LinguisticsAnnual Meeting of the Association for…
- 1 May 2018
Analysis of the polyglot models’ performance provides a new understanding of the similarities and differences between languages in the shared task, and results in improvement in parsing performance on several languages over a monolingual baseline.
Technology-Enabled Disinformation: Summary, Lessons, and Recommendations
- John Akers, Gagan Bansal, Franziska Roesner
- Computer ScienceArXiv
- 21 December 2018
This report summarizes the space of technology-enabled mis- and disinformation based on investigations, and surface the lessons and recommendations for technologists, researchers, platform designers, policymakers, and users.
Low-Resource Parsing with Crosslingual Contextualized Representations
- Phoebe Mulcaire, Jungo Kasai, Noah A. Smith
- Computer Science, LinguisticsConference on Computational Natural Language…
- 1 September 2019
The non-contextual part of the learned language models are examined to demonstrate that polyglot language models better encode crosslingual lexical correspondence compared to aligned monolingual language models, providing further evidence thatpolyglot training is an effective approach toCrosslingual transfer.
Grounded Compositional Outputs for Adaptive Language Modeling
- Nikolaos Pappas, Phoebe Mulcaire, Noah A. Smith
- Computer ScienceConference on Empirical Methods in Natural…
- 24 September 2020
This work proposes a fully compositional output embedding layer for language models, which is further grounded in information from a structured lexicon (WordNet), namely semantically related words and free-text definitions, and is the first word-level language model with a size that does not depend on the training vocabulary.