Corpus ID: 6918033

Constructing a Textual KB from a Biology TextBook

  title={Constructing a Textual KB from a Biology TextBook},
  author={Peter E. Clark and Philip Harrison and Niranjan Balasubramanian and Oren Etzioni},
As part of our work on building a "knowledgeable textbook" about biology, we are developing a textual question-answering (QA) system that can answer certain classes of biology questions posed by users. In support of that, we are building a "textual KB" - an assembled set of semi-structured assertions based on the book - that can be used to answer users' queries, can be improved using global consistency constraints, and can be potentially validated and corrected by domain experts. Our approach… Expand
Advances in Automated Knowledge Base Construction
Recent years have seen significant advances on the creation of large-scale knowledge bases (KBs). Extracting knowledge from Web pages, and integrating it into a coherent KB is a task that spans theExpand
PDFMEF: A Multi-Entity Knowledge Extraction Framework for Scholarly Documents and Semantic Search
We introduce PDFMEF, a multi-entity knowledge extraction framework for scholarly documents in the PDF format. It is implemented with a framework that encapsulates open-source extraction tools.Expand
Discourse in Multimedia: A Case Study in Extracting Geometry Knowledge from Textbooks
It is concluded that the discourse and text layout features in multimedia text provide information that is complementary to lexical semantic information and can be used to improve an existing solver for geometry problems, making it more accurate as well as more explainable. Expand
Discourse in Multimedia: A Case Study in Extracting Geometry Knowledge from Textbooks
To ensure readability, text is often written and presented with due formatting. These text formatting devices help the writer to effectively convey the narrative. At the same time, these help theExpand
Discourse in Multimedia: A Case Study in Information Extraction
This paper examines how multimedia discourse features in multimedia text can be used to improve an information extraction system and shows that the discourse and text layout features provide information that is complementary to lexical semantic information commonly used for information extraction. Expand
From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems
This work uses rich contextual and typographical features extracted from raw textbooks to further refine the harvested axioms, which are then parsed into rules that are used to improve the state-of-the-art in solving geometry problems. Expand
Towards Literate Artificial Intelligence
Standardized tests are used to test students as they progress in the formal education system. These tests are readily available and have clear evaluation procedures.Hence, it has been proposed thatExpand


Boeing’s NLP System and the Challenges of Semantic Representation
Boeing's NLP system, BLUE, is described, comprising a pipeline of a parser, a logical form (LF) generator, an initial logic generator, and further processing modules, and the more general question of what exactly constitutes a "semantic representation". Expand
Discovery of inference rules for question-answering
This paper presents an unsupervised algorithm for discovering inference rules from text based on an extended version of Harris’ Distributional Hypothesis, which states that words that occurred in the same contexts tend to be similar. Expand
Open Information Extraction from the Web
Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input, is introduced. Expand
Project Halo Update - Progress Toward Digital Aristotle
The design and evaluation results for a system called AURA are presented, which enables domain experts in physics, chemistry, and biology to author a knowledge base and that then allows a different set of users to ask novel questions against that knowledge base. Expand
Global Learning of Typed Entailment Rules
The results show that using global transitivity information substantially improves performance over this resource and several baselines, and that the scaling methods allow us to increase the scope of global learning of entailment-rule graphs. Expand
Identifying Relations for Open Information Extraction
Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos. Expand
Global Learning of Focused Entailment Graphs
A graph structure over predicates is defined that represents entailment relations as directed edges, and a global transitivity constraint on the graph is used to learn the optimal set of edges, by formulating the optimization problem as an Integer Linear Program. Expand
WordNet : an electronic lexical database
The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented. Expand
Coupled semi-supervised learning for information extraction
This paper characterize several ways in which the training of category and relation extractors can be coupled, and presents experimental results demonstrating significantly improved accuracy as a result. Expand
Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity
The experiments reveal that monolingual scoring of bilingually extracted paraphrases has a significantly stronger correlation with human judgment for grammaticality than the probabilities assigned by the bilingual pivot-based method does. Expand