The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data

Abstract

There have been several studies of the use of stemming algorithms for conflating morphological variants in freetext retrieval systems. Comparison of stemmed and nonconflated searches suggests that there are no significant increases in the effectiveness of retrieval when stemming is applied to English-language documents and queries. This article reports the use of stemming on Slovene-language documents and queries, and demonstrates that the use of an appropriate stemming algorithm results in a large, and statistically significant, increase in retrieval effectiveness when compared with nonconflated processing; similar comments apply to the use of manual, right-hand truncation. A comparison is made with stemming of English versions of the same documents and queries and it is concluded that the effectiveness of a stemming algorithm is determined by the morphological complexity of the language that it is designed to process.

DOI: 10.1002/(SICI)1097-4571(199206)43:5%3C384::AID-ASI6%3E3.0.CO;2-L

2 Figures and Tables

Statistics

0102030'95'97'99'01'03'05'07'09'11'13'15'17
Citations per Year

133 Citations

Semantic Scholar estimates that this publication has 133 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Popovic1992TheEO, title={The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data}, author={Mirko Popovic and Peter Willett}, journal={JASIS}, year={1992}, volume={43}, pages={384-390} }