Similarity Measures for Query Expansion in TopX


TopX is a top-k retrieval engine for text and XML data. Unlike some other engines, TopX includes an ontology. This ontology allows TopX to use techniques like word sense disambiguation and query expansion, to search for words similar to the original query terms. These techniques allow finding data items which would be ignored for the original source query, due to missing of words similar to the query terms. The similarity of words is given via the weights of the relations connecting words. The underlying ontology of TopX is the WordNet ontology, but in 2007 there was a further ontology integrated, the YAGO ontology. This thesis has three main focuses: • Import of a new version of the YAGO ontology. • Similarity computation for YAGO relations. • Adaptations of the TopX procedures for word sense disambiguation and query expansion to the differences between the WordNet ontology and the YAGO ontology. We demonstrate the improvement of our approach for TopX, with center to the newly available YAGO ontology.

