• Publications
  • Influence
An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction
TLDR
A new dataset is introduced that includes queries that are out-of-scope—i.e., queries that do not fall into any of the system’s supported intents, posing a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class.
Improving Text-to-SQL Evaluation Methodology
TLDR
It is shown that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries, and proposes a complementary dataset split for evaluation of future work.
A Large-Scale Corpus for Conversation Disentanglement
TLDR
A new dataset of 77,563 messages manually annotated with reply-structure graphs that both disentangle conversations and define internal conversation structure is created, which is 16 times larger than all previously released datasets combined, the first to include adjudication of annotation disagreements, and theFirst to include context.
Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
TLDR
This work classifies errors within a set of linguistically meaningful types using tree transformations that repair groups of errors together, and uses this analysis to answer a range of questions about parser behaviour, including what linguistic constructions are difficult for state-of-the-art parsers, what types of errors are being resolved by rerankers, and what types are introduced when parsing out- of-domain text.
Factors Influencing the Surprising Instability of Word Embeddings
TLDR
It is shown that even relatively high frequency words (100-200 occurrences) are often unstable, and empirical evidence is provided for how various factors contribute to the stability of word embeddings, and the effects of stability on downstream tasks are analyzed.
Tools for Automated Analysis of Cybercriminal Markets
TLDR
This work proposes an automated, top-down approach that uses natural language processing and machine learning to automatically generate high-level information about underground forums, first identifying posts related to transactions, and then extracting products and prices.
Error-Driven Analysis of Challenges in Coreference Resolution
TLDR
This work considers an automated method of categorizing errors in the output of a coreference system into intuitive underlying error types, empirically characterizing the major unsolved challenges of the coreference resolution task.
DSTC7 Task 1: Noetic End-to-End Response Selection
TLDR
This task provided two new resources that presented different challenges: one was focused but small, while the other was large but diverse, creating a range of neural network models, including some that successfully incorporated external data to boost performance.
Spatiotemporal hierarchy of relaxation events, dynamical heterogeneities, and structural reorganization in a supercooled liquid.
TLDR
The results characterize the way in which dynamical heterogeneity evolves in moderately supercooled liquids and reveal that it is astonishingly similar to the one found for dense glassy granular media.
No Press Diplomacy: Modeling Multi-Agent Gameplay
TLDR
This work focuses on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players, and presents DipNet, a neural-network-based policy model for No Press Diplomacy.
...
...