Stemming Algorithms: A Case Study for Detailed Evaluation

@article{Hull1996StemmingAA,
  title={Stemming Algorithms: A Case Study for Detailed Evaluation},
  author={David A. Hull},
  journal={J. Am. Soc. Inf. Sci.},
  year={1996},
  volume={47},
  pages={70-84}
}
The majority of information retrieval experiments are evaluated by measures such as average precision and average recall. Fundamental decisions about the superiority of one retrieval technique over another are made solely on the basis of these measures. We claim that average performance gures need to be validated with a careful statistical analysis and that there is a great deal of additional information that can be uncovered by looking closely at the results of individual queries. This paper… Expand
Analysis of performance variation using query expansion
TLDR
A case study shows the potential of a statistical repeated measures analysis of variance for testing the significance of factors in retrieval performance variation in the TREC-9 Query Track data. Expand
Conflation-based Comparison of Stemming Algorithms
TLDR
This paper investigates several stemming algorithms, measuring their ability to correctly connate terms from a large text collection, and shows that stemming is indeed worthwhile, but that each of the stemming algorithms it considers has distinct advantages and disadvantages. Expand
A Detailed Analysis of English Stemming
We present a study comparing the performance of traditional stemming algorithms based on suux removal to linguistic methods performing morphological analysis. The results indicate that most connationExpand
A Detailed Analysis of English Stemming Algorithms
We present a study comparing the performance of traditional stemming algorithms based on suffix removal to linguistic methods performing morphological analysis. The results indicate that mostExpand
Improving Precision in Information Retrieval for Swedish using Stemming
TLDR
An evaluation of how much stemming improves precision in information retrieval for Swedish texts by building an information retrieval tool with optional stemming and creating a tagged corpus in Swedish found that stemming improved both precision and recall. Expand
A Survey of Automatic Query Expansion in Information Retrieval
TLDR
This survey presents a unified view of a large number of recent approaches to AQE that leverage various data sources and employ very different principles and techniques. Expand
GRAS: An effective and efficient stemming algorithm for information retrieval
TLDR
Significant performance improvement over plain word-based retrieval, three other language-independent morphological normalizers, as well as rule-based stemmers is demonstrated. Expand
Variations on language modeling for information retrieval
TLDR
This dissertation makes a contribution to the field of language modeling (LM) for IR, which views both queries and documents as instances of a unigram language model and defines the matching function between a query and each document as the probability that the query terms are generated by the document language model. Expand
Context sensitive stemming for web search
TLDR
A context sensitive stemming method that performs a context sensitive document matching for expanded variants of words in documents and serves as a safeguard against spurious stemming, and it turns out to be very important for improving precision. Expand
Statistical inference in retrieval effectiveness evaluation
  • J. Savoy
  • Computer Science
  • Inf. Process. Manag.
  • 1997
TLDR
This study suggests applying another statistical inference methodology called bootstrap, within which no particular assumption is needed about the distribution of the observations, which may be used to assert the accuracy of virtually any statistic, to build approximate confidence interval, and to verify whether a statistically significant difference exists between two retrieval schemes, even when dealing with a relatively small sample size. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 22 REFERENCES
Using statistical testing in the evaluation of retrieval experiments
TLDR
It is suggested that relevance feedback be evaluated from the perspective of the user and a number of different statistical tests are described for determining if differences in performance between retrieval methods are significant. Expand
An evaluation of some conflation algorithms for information retrieval
TLDR
Comparative experiments with a range of keyword dictionaries and with the Cranfield document test collection suggest that there is relatively little difference in the performance of conflation algorithms despite the widely disparate means by which they have been developed and byWhich they operate. Expand
An Association Thesaurus for Information Retrieval
TLDR
An approach, called PhraseFinder, is proposed to construct collection-dependent association thesauri automatically using large full-text document collections, and can be accessed through natural language queries in INQUERY, an information retrieval system based on the probabilistic inference network. Expand
Query expansion using lexical-semantic relations
TLDR
Examination of the utility of lexical query expansion in the large, diverse TREC collection shows this query expansion technique makes little difference in retrieval effectiveness if the original queries are relatively complete descriptions of the information being sought even when the concepts to be expanded are selected by hand. Expand
Development of a stemming algorithm
  • J. B. Lovins
  • Computer Science
  • Mech. Transl. Comput. Linguistics
  • 1968
TLDR
A new version of a context-sensitive, longest-match stemming algorithm for English is proposed; though developed for use in a library information transfer system, it is of general application. Expand
In information retrieval: data structures and algorithms
TLDR
As one of the products to see in internet, this website becomes a very available place to look for countless information retrieval data structures and algorithms sources. Expand
Information Retrieval: Data Structures and Algorithms
TLDR
For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents. Expand
Viewing morphology as an inference process
TLDR
The role of morphological analysis in word sense disambiguation, and in identifying lexical semantic relationships in a machine-readable dictionary, is described. Expand
A Statistical Analysis of the TREC-3 Data
A statistical analysis of the TREC-3 data shows that performance differences across queries is greater than performance differences across participants runs. Generally, groups of runs which do notExpand
...
1
2
3
...