Analysis of Text Based Data Retrieval System

Abstract

This paper presents the analysis of text based data retrieval system. This paper also introduces the implementation of text retrieval system using Apache Lucene Technology. It gives the significance of Term frequency and Inverse Document Frequency in Lucene’s scoring Formula. This analysis is useful for implementation of system which stores index for local files as well as data from email accounts. Lucene is a Java library which includes token scanning, token parsing, frequency count and document inverting that can be easily added to any application. In this paper, we have presented techniques to improve performance of Lucene by modifying certain parameter of document scoring formula. Lucene’s performance also can be improved by modifying algorithm for incremental indexing and parallel processing. The purpose of developing such system is to reduce manual efforts of searching to greater extent. The strength of this system is its portability and security.

7 Figures and Tables

Cite this paper

@inproceedings{Wani2014AnalysisOT, title={Analysis of Text Based Data Retrieval System}, author={Saurabh S. Wani}, year={2014} }