Deploying Semantic Resources for Open Domain Question Answering

Abstract

This thesis investigates how semantic resources can be deployed to improve the accuracy of an open domain question answering (QA) system. In particular, two types of semantic resources have been utilized to answer factoid questions: (1) Semantic parsing techniques are applied to analyze questions for semantic structures and to find phrases in the knowledge source that match these structures. (2) Ontologies are used to extract terms from questions and corpus sentences and to enrich these terms with semantically similar concepts. These resources have been integrated in the Ephyra QA framework and were compared to previously developed syntactic answer extraction approaches. A semantic extractor for factoid answers was devised that generates semantic representations of the question and phrases in the corpus and extracts answer candidates from phrases that are similar to the question. Different query generation techniques are used to retrieve relevant text passages from the corpus, ranging from simple keyword queries over compound terms expanded with synonyms to specific query strings built from predicate-argument structures. A fuzzy similarity metric compares semantic structures at the level of key terms by measuring their pairwise syntactic and semantic similarities and aggregates these term similarities into an overall similarity score. This mechanism is flexible and robust to parsing errors and it maximizes the recall of the semantic answer extractor. Score normalization and combination techniques allow merging answer candidates found with different semantic and syntactic extraction strategies. Several ontologies are used to extract compound terms from questions and answer sentences and to expand terms with alternative representations. (1) A framework for domain-specific ontologies allows integrating expert knowledge on restricted domains. (2) WordNet is used as an open-domain resource of ontological knowledge. (3) A new approach for automatically learning semantic relation between entities and events in a textual corpus is introduced. Semantic structures are extracted from the corpus with a semantic parser and are subsequently transformed into a semantic network that reveals relations between the entities and events in the corpus. These semantic query generation and answer extraction techniques were assessed on factoid questions from past TREC evaluations using the Web as a large open domain corpus, as well as a local domain-specific document collection. The evaluation results show that the semantic extraction approach has a higher precision than Ephyra’s syntactic answer extractors and that a hybrid approach of semantic and syntactic answer extractors outperforms each individual technique. Furthermore, the query expansion techniques can be combined with existing syntactic extractors to boost their accuracy.

29 Figures and Tables

Cite this paper

@inproceedings{Schlaefer2007DeployingSR, title={Deploying Semantic Resources for Open Domain Question Answering}, author={Nico Schlaefer and Petra Gieselmann}, year={2007} }