Access the NEW Basecamp Support Portal

Stemming

« Back to Glossary Index

Introduction to Stemming
– Stemming is the process of reducing inflected words to their word stem.
– Stemming is used in linguistic morphology and information retrieval.
– Stemming algorithms have been studied since the 1960s.
– Stemming helps search engines treat words with the same stem as synonyms.
– The first published stemmer was written by Julie Beth Lovins in 1968.

Types of Stemming Algorithms
– Simple stemmers use a lookup table to map inflected forms to their stems.
– Lookup approach may use part-of-speech tagging to avoid overstemming.
– Suffix-stripping algorithms find root forms using a set of rules.
– Prefix stripping can also be implemented in some languages.
– Suffix stripping algorithms may differ in results and performance.

Production Technique of Stemming Algorithms
– The lookup table used by a stemmer is produced semi-automatically.
– Inverted algorithms generate inflected forms from a given root form.
– The generation of unlikely forms can be avoided in the production technique.
– The Paice-Husk Stemmer features an externally stored set of stemming rules.
– Chris D Paice developed a direct measurement for comparing stemmers.

Lemmatisation Algorithms
– Lemmatisation involves determining the part of speech of a word.
– Different normalization rules are applied based on the part of speech.
– Correct identification of the lexical category is crucial for accurate lemmatisation.
– Lemmatisation provides more accurate normalization than suffix stripping.
– Lemmatisation algorithms can modify the stem based on additional information.

Stochastic algorithms and other techniques
– Stochastic algorithms use probability to identify the root form of a word.
– Gram analysis uses the n-gram context of a word to determine the correct stem.
– Hybrid approaches combine two or more stemming techniques.
– Affix stemmers deal with both prefixes and suffixes.
– Matching algorithms use a stem database to identify stems.

Note: The references provided in the content are not included in the groups as they are not directly related to the concepts being organized.

Stemming (Wikipedia)

In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. Algorithms for stemming have been studied in computer science since the 1960s. Many search engines treat words with the same stem as synonyms as a kind of query expansion, a process called conflation.

Illustration of word stemming that is similar to tree pruning
Illustration of word stemming that is similar to tree pruning

A computer program or subroutine that stems word may be called a stemming program, stemming algorithm, or stemmer.

« Back to Glossary Index

Request an article

Please let us know what you were looking for and our team will not only create the article but we'll also email you to let you know as soon as it's been published.
Most articles take 1-2 business days to research, write, and publish.
Content/Article Request Form

Submit your RFP

We can't wait to read about your project. Use the form below to submit your RFP!
Request for Proposal

Contact and Business Information

Provide details about how we can contact you and your business.


Quote Request Details

Provide some information about why you'd like a quote.