• Corpus ID: 6918033

Constructing a Textual KB from a Biology TextBook

@inproceedings{Clark2012ConstructingAT,
  title={Constructing a Textual KB from a Biology TextBook},
  author={Peter Clark and Philip Harrison and Niranjan Balasubramanian and Oren Etzioni},
  booktitle={AKBC-WEKEX@NAACL-HLT},
  year={2012}
}
As part of our work on building a "knowledgeable textbook" about biology, we are developing a textual question-answering (QA) system that can answer certain classes of biology questions posed by users. In support of that, we are building a "textual KB" - an assembled set of semi-structured assertions based on the book - that can be used to answer users' queries, can be improved using global consistency constraints, and can be potentially validated and corrected by domain experts. Our approach… 

Figures from this paper

Advances in Automated Knowledge Base Construction
TLDR
This survey summarizes the papers, the keynotes, and the discussions at the AKBC-WEKEX workshop on knowledge extraction at the NAACL-HLC 2012 conference, which had speakers from all major search engine providers, government institutions, andThe leading universities in the field.
PDFMEF: A Multi-Entity Knowledge Extraction Framework for Scholarly Documents and Semantic Search
We introduce PDFMEF, a multi-entity knowledge extraction framework for scholarly documents in the PDF format. It is implemented with a framework that encapsulates open-source extraction tools.
Discourse in Multimedia: A Case Study in Extracting Geometry Knowledge from Textbooks
TLDR
It is concluded that the discourse and text layout features in multimedia text provide information that is complementary to lexical semantic information and can be used to improve an existing solver for geometry problems, making it more accurate as well as more explainable.
Discourse in Multimedia: A Case Study in Extracting Geometry Knowledge from Textbooks
TLDR
It is concluded that the discourse and text layout features in multimedia text provide information that is complementary to lexical semantic information and can be used to improve an existing solver for geometry problems, making it more accurate as well as more explainable.
Discourse in Multimedia: A Case Study in Information Extraction
TLDR
This paper examines how multimedia discourse features in multimedia text can be used to improve an information extraction system and shows that the discourse and text layout features provide information that is complementary to lexical semantic information commonly used for information extraction.
From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems
TLDR
This work uses rich contextual and typographical features extracted from raw textbooks to further refine the harvested axioms, which are then parsed into rules that are used to improve the state-of-the-art in solving geometry problems.
Towards Literate Artificial Intelligence
TLDR
A unified max-margin framework that learns to find hidden structures given a corpus of question-answer pairs, and uses what it learns to answer questions on novel texts to obtain state-of-the-art performance on two well-known natural language comprehension benchmarks.

References

SHOWING 1-10 OF 12 REFERENCES
Boeing’s NLP System and the Challenges of Semantic Representation
TLDR
Boeing's NLP system, BLUE, is described, comprising a pipeline of a parser, a logical form (LF) generator, an initial logic generator, and further processing modules, and the more general question of what exactly constitutes a "semantic representation".
Discovery of inference rules for question-answering
TLDR
This paper presents an unsupervised algorithm for discovering inference rules from text based on an extended version of Harris’ Distributional Hypothesis, which states that words that occurred in the same contexts tend to be similar.
Open Information Extraction from the Web
TLDR
Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input, is introduced.
Project Halo Update - Progress Toward Digital Aristotle
TLDR
The design and evaluation results for a system called AURA are presented, which enables domain experts in physics, chemistry, and biology to author a knowledge base and that then allows a different set of users to ask novel questions against that knowledge base.
Global Learning of Typed Entailment Rules
TLDR
The results show that using global transitivity information substantially improves performance over this resource and several baselines, and that the scaling methods allow us to increase the scope of global learning of entailment-rule graphs.
Identifying Relations for Open Information Extraction
TLDR
Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos.
Global Learning of Focused Entailment Graphs
TLDR
A graph structure over predicates is defined that represents entailment relations as directed edges, and a global transitivity constraint on the graph is used to learn the optimal set of edges, by formulating the optimization problem as an Integer Linear Program.
WordNet : an electronic lexical database
TLDR
The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Coupled semi-supervised learning for information extraction
TLDR
This paper characterize several ways in which the training of category and relation extractors can be coupled, and presents experimental results demonstrating significantly improved accuracy as a result.
Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity
TLDR
The experiments reveal that monolingual scoring of bilingually extracted paraphrases has a significantly stronger correlation with human judgment for grammaticality than the probabilities assigned by the bilingual pivot-based method does.
...
...