History of stop words
– Predecessor concept used in creating concordances
– Isaac Nathan ben Kalonymuss Meir Nativ’s Hebrew concordance included a list of unindexed words
– Hans Peter Luhn coined the phrase ‘stop word’ and used it in his indexing process
– Stop word, stop list, and stoplist terms appeared shortly after Luhn’s presentation
– C.J. Van Rijsbergen proposed the first standardized stop list not based on word frequency
SEO terminology
– Stop words are common words avoided by search engines for space and time-saving purposes
– Some search engines remove short function words like ‘the’ and ‘and’
– Stop words can cause issues when searching for phrases that include them
– Search engines may remove common words, including lexical words, to improve performance
– SEO best practices have evolved with machine learning and natural language processing
Related concepts
– Concept mining
– Filler (linguistics)
– Function words
– Index (search engine)
– Information extraction
References
– ‘Data Mining’ by A. Rajaraman and J. D. Ullman
– ‘Introduction to Information Retrieval’ by C. D. Manning, P. Raghavan, and H. Schütze
– ‘Predecessors of scientific indexing structures in the domain of religion’ by B. H. Weinberg
– ‘Keyword-in-Context Index for Technical Literature (KWIC Index)’ by H. P. Luhn
– ‘Historical note: The Start of a Stop List at Biological Abstracts’ by B. J. Flood
Evolution of stop words in SEO
– John Mueller, Webmaster Trends Analyst at Google, emphasized that stop words shouldn’t be a concern
– Search engines consider more than individual words
– Stop words alone do not provide sufficient context
– Mueller’s statement highlights the importance of writing naturally
– SEO practices have shifted focus towards broader factors like semantic understanding
Stop words are the words in a stop list (or stoplist or negative dictionary) which are filtered out (i.e. stopped) before or after processing of natural language data (text) because they are insignificant. There is no single universal list of stop words used by all natural language processing tools, nor any agreed upon rules for identifying stop words, and indeed not all tools even use such a list. Therefore, any group of words can be chosen as the stop words for a given purpose. The "general trend in [information retrieval] systems over time has been from standard use of quite large stop lists (200–300 terms) to very small stop lists (7–12 terms) to no stop list whatsoever".