• Publications
  • Influence
Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
TLDR
This work defines a new complex and cross-domain semantic parsing and text-to-SQL task so that different complicated SQL queries and databases appear in train and test sets and experiments with various state-of-the-art models show that Spider presents a strong challenge for future research.
SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task
TLDR
Experimental results show that SyntaxSQLNet can handle a significantly greater number of complex SQL examples than prior work, outperforming the previous state-of-the-art model by 9.5% in exact matching accuracy.
WILDS: A Benchmark of in-the-Wild Distribution Shifts
TLDR
WILDS is presented, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, and is hoped to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings.
ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks
TLDR
The first large-scale manually-annotated corpus for scientific papers is developed and released by enabling faster annotation and summarization methods that integrate the authors’ original highlights and the article’s actual impacts on the community are proposed, to create comprehensive, hybrid summaries.
On the Opportunities and Risks of Foundation Models
TLDR
This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities, to their applications, and what they are even capable of due to their emergent properties.
Graph-based Neural Multi-Document Summarization
TLDR
This model improves upon other traditional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive results against other state-of-the-art multi-document summarization systems.
SParC: Cross-Domain Semantic Parsing in Context
TLDR
An in-depth analysis of SParC is provided and it is shown that it introduces new challenges compared to existing datasets and requires generalization to unseen domains due to its cross-domain nature and the unseen databases at test time.
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
TLDR
CoSQL is presented, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems that includes SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction and a set of strong baselines are evaluated.
Robust Multilingual Part-of-Speech Tagging via Adversarial Training
TLDR
It is found that AT not only improves the overall tagging accuracy, but also prevents over-fitting well in low resource languages and boosts tagging accuracy for rare / unseen words.
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering
TLDR
This work proposes a new model, QA-GNN, which addresses the problem of answering questions using knowledge from pre-trained language models (LMs) and knowledge graphs (KGs) through two key innovations: relevance scoring and joint reasoning.
...
1
2
3
...