An algorithm for suffix stripping
@article{Porter1997AnAF, title={An algorithm for suffix stripping}, author={Martin F. Porter}, journal={Program}, year={1997}, volume={40}, pages={211-218} }
The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each…
9,346 Citations
Suffix Stripping Problem as an Optimization Problem
- Computer ScienceArXiv
- 2013
This work defines stemming as an optimization problem for the very first time in the literature and exhibits its approach by applying it to clusters of English and Spanish words.
Recursive Suffix Stripping to Augment Bangla Stemmer
- Computer Science
- 2016
In the proposed method, an inflectional word is stemmed in all possible ways by the recursive suffix stripping algorithm before identifying the final stem using the conservative, the aggressive and the rule-based approaches.
An Iterative Suffix Stripping Tamil Stemmer
- Computer Science
- 2012
A stemmer for Tamil, a Dravidian language is presented, with the main objective of enhancing the recall factor.
A simple algorithm for the problem of suffix stripping
- Computer Science
- 2015
Free from linguistic or morphological knowledge, a simple algorithm is being developed and Superiority of the algorithm over an established technique for English language is being demonstrated.
Efficient multi-word expressions extractor using suffix arrays and related structures
- Computer ScienceiNEWS '08
- 2008
The choice of Suffix Arrays and the construction of auxiliary structures enabled a clear minimization of the time for extracting multi-word expressions, with linear complexity by the introduction of a limitation on the number of words.
Stemming of French Words Based on Grammatical Categories
- Linguistics, Computer ScienceJ. Am. Soc. Inf. Sci.
- 1993
A suffixing algorithm which uses grammatical categories to enhance the stemming process and always returns a linguistically correct lemma, but not necessarily the “right” one.
Automatic lemmatization of Persian words*
- Linguistics, Computer ScienceJ. Quant. Linguistics
- 2006
The main application of this algorithm is in the field of information retrieval, it can be used in a machine translation system from Persian into any other language and a stem dictionary for morphological analysis should be used.
A failure analysis of the limitation of suffixing in an online environment
- Computer ScienceSIGIR '87
- 1987
The interaction of suffixing algorithms and ranking techniques in retrieval performance, particularly in an online environment, was investigated and two modifications to ranking techniques were suggested: variable weighting of word variants and selective stemming depending on query length.
A rule-based approach of stemming for inflectional and derivational words in Bengali
- LinguisticsIEEE Technology Students' Symposium
- 2011
This paper presents an approach for finding out the stems from text in Bengali by stripping off the suffix part from Bengali words using some suffix stripping rules, depending upon the type of suffixes.
Development of a Manipuri stemmer: A hybrid approach
- Computer Science2015 International Symposium on Advanced Computing and Communication (ISACC)
- 2015
The paper presents a stemmer for Manipuri, which uses a brute force algorithm, and uses a suffix stripping technique in this stemmer, which can be use as an important tool in information retrieval system in Manipuri language.
References
SHOWING 1-8 OF 8 REFERENCES
Development of a stemming algorithm
- Computer ScienceMech. Transl. Comput. Linguistics
- 1968
A new version of a context-sensitive, longest-match stemming algorithm for English is proposed; though developed for use in a library information transfer system, it is of general application.
FIRST: Flexible Information Retrieval System for Text
- Computer ScienceJ. Am. Soc. Inf. Sci.
- 1979
An on‐line document retrieval system is described which combines a data base management system with automatic processing of natural language queries and abstracts, providing direct access to documents with specified bibliographic or descriptor items.
Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices
- Education
- 1966
An essential requirement of the project involved cooperation of a large number of research scientists, and the response to the request was most satisfactory, and I acknowledge with thanks the generous assistance of some two hundred scientists.
The development of a fast conflation algorithm for English”, dissertation for the Diploma in Computer Science, Computer Laboratory, University of Cambridge
- 1971
Suffix removal and word conflation
- ALLC Bulletin, Michaelmas,
- 1974
Use of an automatically generated authority list to eliminate scattering caused by some singular and plural main index terms
- Proceedings of the American Society for Information Science,
- 1969
Final report on improved access to scientific and technical information through automated vocabulary switching
- NSF Grant No. SIS75-12924 to the National Science Foundation