An algorithm for suffix stripping

@article{Porter1997AnAF,
  title={An algorithm for suffix stripping},
  author={Martin F. Porter},
  journal={Program},
  year={1997},
  volume={40},
  pages={211-218}
}
  • M. Porter
  • Published 1 December 1997
  • Linguistics
  • Program
The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each… 

Suffix Stripping Problem as an Optimization Problem

This work defines stemming as an optimization problem for the very first time in the literature and exhibits its approach by applying it to clusters of English and Spanish words.

Recursive Suffix Stripping to Augment Bangla Stemmer

In the proposed method, an inflectional word is stemmed in all possible ways by the recursive suffix stripping algorithm before identifying the final stem using the conservative, the aggressive and the rule-based approaches.

An Iterative Suffix Stripping Tamil Stemmer

A stemmer for Tamil, a Dravidian language is presented, with the main objective of enhancing the recall factor.

A simple algorithm for the problem of suffix stripping

Free from linguistic or morphological knowledge, a simple algorithm is being developed and Superiority of the algorithm over an established technique for English language is being demonstrated.

Efficient multi-word expressions extractor using suffix arrays and related structures

The choice of Suffix Arrays and the construction of auxiliary structures enabled a clear minimization of the time for extracting multi-word expressions, with linear complexity by the introduction of a limitation on the number of words.

Stemming of French Words Based on Grammatical Categories

  • J. Savoy
  • Linguistics, Computer Science
    J. Am. Soc. Inf. Sci.
  • 1993
A suffixing algorithm which uses grammatical categories to enhance the stemming process and always returns a linguistically correct lemma, but not necessarily the “right” one.

Automatic lemmatization of Persian words*

The main application of this algorithm is in the field of information retrieval, it can be used in a machine translation system from Persian into any other language and a stem dictionary for morphological analysis should be used.

A failure analysis of the limitation of suffixing in an online environment

The interaction of suffixing algorithms and ranking techniques in retrieval performance, particularly in an online environment, was investigated and two modifications to ranking techniques were suggested: variable weighting of word variants and selective stemming depending on query length.

A rule-based approach of stemming for inflectional and derivational words in Bengali

This paper presents an approach for finding out the stems from text in Bengali by stripping off the suffix part from Bengali words using some suffix stripping rules, depending upon the type of suffixes.

Development of a Manipuri stemmer: A hybrid approach

The paper presents a stemmer for Manipuri, which uses a brute force algorithm, and uses a suffix stripping technique in this stemmer, which can be use as an important tool in information retrieval system in Manipuri language.
...

References

SHOWING 1-8 OF 8 REFERENCES

Development of a stemming algorithm

  • J. B. Lovins
  • Computer Science
    Mech. Transl. Comput. Linguistics
  • 1968
A new version of a context-sensitive, longest-match stemming algorithm for English is proposed; though developed for use in a library information transfer system, it is of general application.

FIRST: Flexible Information Retrieval System for Text

  • R. Dattola
  • Computer Science
    J. Am. Soc. Inf. Sci.
  • 1979
An on‐line document retrieval system is described which combines a data base management system with automatic processing of natural language queries and abstracts, providing direct access to documents with specified bibliographic or descriptor items.

Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices

An essential requirement of the project involved cooperation of a large number of research scientists, and the response to the request was most satisfactory, and I acknowledge with thanks the generous assistance of some two hundred scientists.

The development of a fast conflation algorithm for English”, dissertation for the Diploma in Computer Science, Computer Laboratory, University of Cambridge

  • 1971

Suffix removal and word conflation

  • ALLC Bulletin, Michaelmas,
  • 1974

Use of an automatically generated authority list to eliminate scattering caused by some singular and plural main index terms

  • Proceedings of the American Society for Information Science,
  • 1969

Final report on improved access to scientific and technical information through automated vocabulary switching

  • NSF Grant No. SIS75-12924 to the National Science Foundation