• Publications
  • Influence
Improving Text-to-SQL Evaluation Methodology
TLDR
We identify limitations of and propose improvements to current evaluations of text-to-SQL systems for mapping natural language to structured database queries. Expand
  • 89
  • 17
  • PDF
Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
TLDR
We propose a novel tree-transformation methodology for evaluating parsers that categorises errors into linguistically meaningful types. Expand
  • 83
  • 12
  • PDF
An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction
TLDR
We introduce a new dataset that includes queries that are out-of-scope---i.e., queries that do not fall into any of the system's supported intents. Expand
  • 37
  • 9
  • PDF
Factors Influencing the Surprising Instability of Word Embeddings
TLDR
We provide empirical evidence for how various factors contribute to the stability of word embeddings, and we analyze the effects of stability on downstream tasks. Expand
  • 56
  • 6
  • PDF
Error-Driven Analysis of Challenges in Coreference Resolution
TLDR
We present a new tool that automatically classifies errors in the standard output of any coreference resolution system into intuitive underlying error types. Expand
  • 44
  • 5
  • PDF
Tools for Automated Analysis of Cybercriminal Markets
TLDR
We propose an automated, top-down approach for analyzing underground forums, first identifying posts related to transactions and then extracting products and prices. Expand
  • 38
  • 5
  • PDF
A Large-Scale Corpus for Conversation Disentanglement
TLDR
We introduce a new dataset of 77,563 messages manually annotated with reply-structure graphs that both disentangle conversations and define internal conversation structure. Expand
  • 26
  • 5
  • PDF
Spatiotemporal hierarchy of relaxation events, dynamical heterogeneities, and structural reorganization in a supercooled liquid.
We identify the pattern of microscopic dynamical relaxation for a two-dimensional glass-forming liquid. On short time scales, bursts of irreversible particle motion, called cage jumps, aggregate intoExpand
  • 100
  • 4
  • PDF
Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection
TLDR
We present the first systematic study of the key factors in crowdsourcing paraphrase collection, including the first exploration of worker incentives. Expand
  • 29
  • 4
  • PDF
DSTC7 Task 1: Noetic End-to-End Response Selection
TLDR
This paper provides an overview of (1) the task structure, (2) the datasets, (3) the evaluation metrics, and (4) system results. Expand
  • 27
  • 4
  • PDF