LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia
@inproceedings{Dubey2019LCQuAD2A, title={LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia}, author={Mohnish Dubey and Debayan Banerjee and Abdelrahman Abdelkawi and Jens Lehmann}, booktitle={International Workshop on the Semantic Web}, year={2019} }
Providing machines with the capability of exploring knowledge graphs and answering natural language questions has been an active area of research over the past decade. [] Key Method LC-QuAD 2.0 is compatible with both Wikidata and DBpedia 2018 knowledge graphs. In this article, we explain how the dataset was created and the variety of questions available with examples. We further provide a statistical analysis of the dataset.
121 Citations
Bio-SODA: Enabling Natural Language Question Answering over Knowledge Graphs without Training Data
- Computer ScienceSSDBM
- 2021
This paper introduces Bio-SODA, a natural language processing engine that does not require training data in the form of question-answer pairs for generating SPARQL queries, and uses a generic graph-based approach for translating user questions to a ranked list of SParQL candidate queries.
A Chinese Multi-type Complex Questions Answering Dataset over Wikidata
- Computer ScienceArXiv
- 2021
This work proposes CLC-QuAD, the first large scale complex Chinese semantic parsing dataset over Wikidata, and presents a text-toSPARQL baseline model, which can effectively answer multi-type complex questions, such as factual questions, dual intent questions, boolean questions, and counting questions, withWikidata as the background knowledge.
VQuAnDa: Verbalization QUestion
- Computer Science
- 2020
This work aims to fill the gap in Question Answering datasets by providing the first QA dataset VQuAnDa that includes the verbalization of each answer, based on a commonly used large-scale QA datasets – LC-QuAD, in order to support compatibility and continuity of previous work.
VQuAnDa: Verbalization QUestion ANswering DAtaset
- Computer ScienceESWC
- 2020
This work aims to fill the gap in Question Answering datasets by providing the first QA dataset VQuAnDa that includes the verbalization of each answer, based on a commonly used large-scale QA datasets – LC-QuAD, in order to support compatibility and continuity of previous work.
PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge Graph
- Computer Science2020 11th International Conference on Information and Knowledge Technology (IKT)
- 2020
This paper introduces PeCoQ, a dataset for Persian question answering that contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase, and discusses the dataset's characteristics and describes the methodolozv for building it.
MQALD: Evaluating the impact of modifiers in question answering over knowledge graphs
- Computer ScienceSemantic Web
- 2022
This work aims to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language and to evaluate three QA systems available at the state of the art.
Bio-SODA UX: enabling natural language question answering over knowledge graphs with user disambiguation
- Computer ScienceDistributed and Parallel Databases
- 2022
Bio-SODA is introduced, a natural language processing engine that does not require training data in the form of question-answer pairs for generating SPARQL queries and which outperforms publicly available KGQA systems on more complex bioinformatics datasets.
Bio-SODA-A Question Answering System for Domain Knowledge Graphs
- Computer Science
- 2020
The prototype implementation, Bio-SODA, a question answering system that does not require training data in the form of question-answer pairs for generating SPARQL queries over closed-domain KGs is introduced and a novel ranking algorithm that includes node centrality as a measure of relevance for candidate matches in relation to a user question is used.
RuBQ 2.0: An Innovated Russian Question Answering Dataset
- Computer ScienceESWC
- 2021
The second version of RuBQ, a Russian dataset for knowledge base question answering (KBQA) over Wikidata, is described and is suitable for the evaluation of KBQA, machine reading comprehension (MRC), hybrid questions answering, as well as semantic parsing.
Complex Temporal Question Answering on Knowledge Graphs
- Computer ScienceCIKM
- 2021
Results show that EXAQT outperforms three state-of-the-art systems for answering complex questions over KGs, thereby justifying specialized treatment of temporal QA.
References
SHOWING 1-10 OF 15 REFERENCES
LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs
- Computer ScienceSEMWEB
- 2017
The Large-Scale Complex Question Answering Dataset (LC-QuAD) is provided, providing a dataset with 5000 questions and their corresponding SPARQL queries over the DBpedia dataset to assess the robustness and accuracy of the next generation of QA systems for knowledge graphs.
Formal Query Generation for Question Answering over Knowledge Bases
- Computer ScienceESWC
- 2018
To enhance the accuracy, this paper presents a ranking model based on Tree-LSTM that takes into account the syntactical structure of the question and the tree representation of the candidate queries to find the one representing the correct intention behind the question.
Neural Network-based Question Answering over Knowledge Graphs on Word and Character Level
- Computer ScienceWWW
- 2017
This work trains a neural network for answering simple questions in an end-to-end manner, leaving all decisions to the model, which contains a nested word/character-level question encoder which allows to handle out-of-vocabulary and rare word problems while still being able to exploit word-level semantics.
The Web as a Knowledge-Base for Answering Complex Questions
- Computer ScienceNAACL
- 2018
This paper proposes to decompose complex questions into a sequence of simple questions, and compute the final answer from the sequence of answers, and empirically demonstrates that question decomposition improves performance from 20.8 precision@1 to 27.5 precision @1 on this new dataset.
AskNow: A Framework for Natural Language Query Formalization in SPARQL
- Computer ScienceESWC
- 2016
This paper proposes a framework, called AskNow, where users can pose queries in English to a target RDF knowledge base e.g. DBpedia, and empirically evaluated the framework with respect to the syntactic robustness of NQS and semantic accuracy of the SPARQL translator on standard benchmark datasets.
Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
- Computer ScienceACL
- 2016
The 30M Factoid Question-Answer Corpus is presented, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base Freebase to transduce facts into natural language questions.
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs
- Computer ScienceSEMWEB
- 2018
This paper proposes a framework called EARL, which performs entity linking and relation linking as a joint task, and implements two different solution strategies, which significantly outperform the current state-of-the-art approaches for entity and relations linking.
DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia
- Computer ScienceSemantic Web
- 2015
An overview of the DBpedia community project is given, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications, including DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud.
Semantic Parsing via Paraphrasing
- Computer ScienceACL
- 2014
This paper presents two simple paraphrase models, an association model and a vector space model, and trains them jointly from question-answer pairs, improving state-of-the-art accuracies on two recently released question-answering datasets.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Computer ScienceNAACL
- 2019
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.