Q C S: An Information Retrieval System for Improving Efficiency in Scientific Literature Searches Final Report for Version 1.0

Abstract

Conducting scientific research most often involves a search through existing literature in order to avoid repeating research efforts, review methods already developed for solving a problem, gain a better understanding of a problem, etc. Typically, this search is performed using the Internet, which is a convenient portal to various databases of books, journal articles, technical reports, preprints, etc. The Query, Cluster, Summarize (QCS) information retrieval system is presented in an attempt to improve efficiency in these literature searches. Given a query, QCS retrieve documents relevant to the query, separates the retrieved documents into topic clusters, and creates a single summary for each of topic clusters. Latent Semantic Indexing is used retrieval, generalized spherical k-means (gmeans) is used for the document clustering, and a hidden Markov model coupled with a pivoted QR decomposition is used to create a single extract summary for each topic cluster. Algorithm and implementation details of the current version of the QCS system, QCS v1.0, are presented, and a description of the user interface to the system is discussed. Examples of the use of QCS v1.0 are presented using data from the Document Understanding Conferences, a conference series dedicated to furthering progress in the area of automatic summarization.

13 Figures and Tables

Cite this paper

@inproceedings{DunlavyQCS, title={Q C S: An Information Retrieval System for Improving Efficiency in Scientific Literature Searches Final Report for Version 1.0}, author={Daniel M. Dunlavy} }