The TechQA Dataset

@article{Castelli2020TheTD,
  title={The TechQA Dataset},
  author={V. Castelli and Rishav Chakravarti and Saswati Dana and Anthony Ferritto and Radu Florian and M. Franz and Dinesh Garg and Dinesh Khandelwal and J. Scott McCarley and Mike McCawley and Mohamed Nasr and Lin Pan and Cezar Pendus and J. Pitrelli and Saurabh Pujar and S. Roukos and Andrzej Sakrajda and Avirup Sil and Rosario A. Uceda-Sosa and T. Ward and Rong Zhang},
  journal={ArXiv},
  year={2020},
  volume={abs/1911.02984}
}
We introduce TECHQA, a domain-adaptation question answering dataset for the technical support domain. The TECHQA corpus highlights two real-world issues from the automated customer support domain. First, it contains actual questions posed by users on a technical forum, rather than questions generated specifically for a competition or a task. Second, it has a real-world size – 600 training, 310 dev, and 490 evaluation question/answer pairs – thus reflecting the cost of creating large labeled… Expand
Technical Question Answering across Tasks and Domains
TLDR
This paper proposes a novel framework of deep transfer learning to effectively address technical QA across tasks and domains and presents an adjustable joint learning approach for document retrieval and reading comprehension tasks. Expand
A Neural Question Answering System for Basic Questions about Subroutines
TLDR
This paper designs a context-based QA system for basic questions about subroutines based on rules the authors extract from recent empirical studies, and trains a custom neural QA model with this dataset and evaluates the model in a study with professional programmers. Expand
Large Scale Question Answering using Tourism Data
TLDR
This work introduces the novel task of answering entity-seeking recommendation questions using a collection of reviews that describe candidate answer entities using a scalable cluster-select-rerank approach. Expand
Automatic Evaluation vs. User Preference in Neural Textual QuestionAnswering over COVID-19 Scientific Literature
TLDR
This paper presents a Question Answering (QA) system that won one of the tasks of the Kaggle CORD-19 Challenge, according to the qualitative evaluation of experts, and puts in question the suitability of automatic metrics and its correlation to user preferences. Expand
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
TLDR
The largest survey of the field to date of question answering and reading comprehension, providing an overview of the various formats and domains of the current resources, and highlighting the current lacunae for future work. Expand
Augmenting Question Answering with Natural Language Explanations
  • Qinyuan Ye
  • 2019
Towards building annotation-efficient question answering (QA) systems for real information seeking needs, we propose a framework that efficiently augments training data by leveraging natural languageExpand
Towards building a Robust Industry-scale Question Answering System
TLDR
The development of a production model called GAAMA (Go Ahead Ask Me Anything) which consists of Attention-over-Attention, diversity among attention heads, hierarchical transfer learning, and synthetic data augmentation while being computationally inexpensive is introduced. Expand
A Technical Question Answering System with Transfer Learning
TLDR
TransTQA is a novel system that offers automatic responses by retrieving proper answers based on correctly answered similar questions in the past, built upon a siamese ALBERT network, which enables it to respond quickly and accurately. Expand
Multi-Stage Pretraining for Low-Resource Domain Adaptation
Transfer learning techniques are particularly useful in NLP tasks where a sizable amount of high-quality annotated data is difficult to obtain. Current approaches directly adapt a pre-trainedExpand
VAULT: VAriable Unified Long Text Representation for Machine Reading Comprehension
TLDR
VAULT is proposed: a light-weight and parallel-efficient paragraph representation for MRC based on contextualized representation from long document input, trained using a new Gaussian distribution-based objective that pays close attention to the partially correct instances that are close to the ground-truth. Expand
...
1
2
...

References

SHOWING 1-10 OF 20 REFERENCES
Natural Questions: A Benchmark for Question Answering Research
TLDR
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature. Expand
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
TLDR
This new dataset is aimed to overcome a number of well-known weaknesses of previous publicly available datasets for the same task of reading comprehension and question answering, and is the most comprehensive real-world dataset of its kind in both quantity and quality. Expand
Frustratingly Easy Natural Question Answering
TLDR
Algorithmic components such as Attention-over-Attention, coupled with data augmentation and ensembling strategies that have shown to yield state-of-the-art results on benchmark datasets like SQuAD, even achieving super-human performance are outlined. Expand
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TLDR
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%). Expand
Neural Domain Adaptation for Biomedical Question Answering
TLDR
This work adapts a neural QA system trained on a large open-domain dataset to a biomedical dataset by employing various transfer learning techniques and achieves state-of-the-art results on factoid questions and competitive results on list questions. Expand
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition
TLDR
Overall, BioASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs. Expand
Know What You Don’t Know: Unanswerable Questions for SQuAD
TLDR
SQuadRUn is a new dataset that combines the existing Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. Expand
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
TLDR
It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions. Expand
The NarrativeQA Reading Comprehension Challenge
TLDR
A new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts are presented, designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience. Expand
A BERT Baseline for the Natural Questions
TLDR
A new baseline for the Natural Questions is described and the gap between the model F1 scores reported in the original dataset paper and the human upper bound is reduced by 30% and 50% relative for the long and short answer tasks respectively. Expand
...
1
2
...