Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

@inproceedings{Yu2018SpiderAL,
  title={Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
  author={Tao Yu and Rui Zhang and Kai-Chou Yang and Michihiro Yasunaga and Dongxu Wang and Zifan Li and James Ma and Irene Z Li and Qingning Yao and Shanelle Roman and Zilin Zhang and Dragomir R. Radev},
  booktitle={EMNLP},
  year={2018}
}
We present Spider, a large-scale, complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 college students. [...] Key Result This shows that Spider presents a strong challenge for future research. Our dataset and task are publicly available at this https URLExpand
SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task
TLDR
Experimental results show that SyntaxSQLNet can handle a significantly greater number of complex SQL examples than prior work, outperforming the previous state-of-the-art model by 9.5% in exact matching accuracy. Expand
Graph Enhanced Cross-Domain Text-to-SQL Generation
TLDR
This paper improves upon a state-of-the-art Spider model, SyntaxSQLNet, by constructing a graph of column names for all databases and using graph neural networks to compute their embeddings, which offer better cross-domain representations and SQL queries. Expand
ChiTeSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset
TLDR
This paper presents DuSQL, a larges-scale and pragmatic Chinese dataset for the cross-domain text-toSQL task, containing 200 databases, 813 tables, and 23,797 question/SQL pairs, and adopts an effective data construction framework via human-computer collaboration. Expand
DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset
TLDR
This paper presents DuSQL, a larges-scale and pragmatic Chinese dataset for the cross-domain text-to-SQL task, containing 200 databases, 813 tables, and 23,797 question/SQL pairs, and adopts an effective data construction framework via human-computer collaboration. Expand
A Pilot Study for Chinese SQL Semantic Parsing
TLDR
A Spider dataset for Chinese is built, showing that word-based semantic parser is subject to segmentation errors and cross-lingual word embeddings are useful for text-to-SQL. Expand
Recursive and Clause-Wise Decoding for Complex and Cross-Domain Text-to-SQL Generation
TLDR
This paper proposes a SQL clause-wise decoding neural architecture with a self-attention based database schema encoder to address Spider task, and shows that the model is significantly more effective to predict complex and nested queries than previous works. Expand
Exploring Unexplored Generalization Challenges for Cross-Database Semantic Parsing
TLDR
This work re-purpose eight semantic parsing datasets that have been well-studied in the setting where in-domain training data is available, and instead use them as additional evaluation data for XSP systems instead, to uncovers several generalization challenges for cross-database semantic parsing. Expand
KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers
TLDR
It is shown that KaggleDBQA presents a challenge to state-ofthe-art zero-shot parsers but a more realistic evaluation setting and creative use of associated database documentation boosts their accuracy by over 13.2%, doubling their performance. Expand
Chase: A Large-Scale and Pragmatic Chinese Dataset for Cross-Database Context-Dependent Text-to-SQL
  • Jiaqi Guo, Ziliang Si, +5 authors Ting Liu
  • Computer Science
  • ACL/IJCNLP
  • 2021
TLDR
This work presents CHASE, a large-scale and pragmatic Chinese dataset for XDTS that consists of 5,459 coherent question sequences over 280 databases, in which only 35% of questions are contextindependent, and 28% of SQL queries are easy. Expand
SParC: Cross-Domain Semantic Parsing in Context
TLDR
An in-depth analysis of SParC is provided and it is shown that it introduces new challenges compared to existing datasets and requires generalization to unseen domains due to its cross-domain nature and the unseen databases at test time. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 46 REFERENCES
Translating Questions to SQL Queries with Generative Parsers Discriminatively Reranked
TLDR
A model for automatically translating a factoid question in natural language to an SQL query that retrieves the correct answer from a target relational database (DB) is defined, which is in line with the best models using external and expensive hand-crafted resources such as the question meaning interpretation. Expand
TypeSQL: Knowledge-Based Type-Aware Neural Text-to-SQL Generation
TLDR
This paper presents a novel approach TypeSQL which formats the problem as a slot filling task in a more reasonable way and utilizes type information to better understand rare entities and numbers in the questions. Expand
Large-scale Semantic Parsing without Question-Answer Pairs
TLDR
This paper introduces a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs and converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Expand
SQLizer: query synthesis from natural language
This paper presents a new technique for automatically synthesizing SQL queries from natural language (NL). At the core of our technique is a new NL-based program synthesis methodology that combinesExpand
Towards a theory of natural language interfaces to databases
TLDR
This paper proves that, for a broad class of semantically tractable natural language questions, Precise is guaranteed to map each question to the corresponding SQL query, and shows that Precise compares favorably with Mooney's learning NLI and with Microsoft's English Query product. Expand
Constructing an Interactive Natural Language Interface for Relational Databases
TLDR
The architecture of an interactive natural language query interface for relational databases is described, able to correctly interpret complex natural language queries, in a generic manner across a range of domains, and is good enough to be usable in practice. Expand
Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability
TLDR
The paper shows how a strong semantic model coupled with "light re-training" enables PRECISE to overcome parser errors, and correctly map from parsed questions to the corresponding SQL queries. Expand
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
TLDR
This work proposes Seq2 SQL, a deep neural network for translating natural language questions to corresponding SQL queries, and releases WikiSQL, a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables fromWikipedia that is an order of magnitude larger than comparable datasets. Expand
Learning Dependency-Based Compositional Semantics
TLDR
A new semantic formalism, dependency-based compositional semantics (DCS) is developed and a log-linear distribution over DCS logical forms is defined and it is shown that the system obtains comparable accuracies to even state-of-the-art systems that do require annotated logical forms. Expand
Learning a Neural Semantic Parser from User Feedback
We present an approach to rapidly and easily build natural language interfaces to databases for new domains, whose performance improves over time based on user feedback, and requires minimalExpand
...
1
2
3
4
5
...