Part-of-speech tagging

« Back to Glossary Index

Part-of-speech tagging basics and techniques
– Part-of-speech tagging is the process of marking up a word in a text as corresponding to a particular part of speech.
– It is based on both the definition and context of the word.
– POS tagging is commonly taught to school-age children to identify words as nouns, verbs, adjectives, adverbs, etc.
– POS tagging is now done using algorithms in computational linguistics.
– There are two groups of POS-tagging algorithms: rule-based and stochastic.
– Adjective and number percentages can help determine the part of speech.
– More advanced HMMs can learn probabilities of larger sequences.
– Enumerating every combination and assigning relative probabilities can improve accuracy.
– CLAWS achieved 93-95% accuracy in part-of-speech tagging.
– Charniak’s research showed that assigning the most common tag to known words and ‘proper noun’ to unknowns can achieve 90% accuracy.
– DeRose and Church developed dynamic programming algorithms for part-of-speech tagging.
– DeRose used a table of pairs, while Church used a table of triples.
– Both methods achieved over 95% accuracy.
– DeRose’s work was replicated for Greek and proved effective.
– Unsupervised tagging techniques use untagged corpora to derive part-of-speech categories.
– Iterative processes reveal patterns in word use and similarity classes.
– Rule-based, stochastic, and neural approaches are used in unsupervised tagging.
– Unsupervised tagging can provide valuable new insights.
– Induction-based methods can achieve accuracy above 95%.
– Major algorithms for part-of-speech tagging include the Viterbi algorithm, Brill tagger, Constraint Grammar, and Baum-Welch algorithm.
– Hidden Markov model and visible Markov model taggers use the Viterbi algorithm.
– The rule-based Brill tagger applies learned rule patterns.
– Machine learning methods like SVM, maximum entropy classifier, perceptron, and nearest-neighbor have been applied to part-of-speech tagging.
– A direct comparison of methods reported 97.36% accuracy using the structure regularization method.

Tag Sets and Variations
– English has 9 commonly taught parts of speech, but there are many more categories and sub-categories.
– Nouns can have plural, possessive, and singular forms, while verbs can be marked for tense and aspect.
– Different inflections of the same root word can have different parts of speech.
– Tag sets for POS tagging in English can range from 50 to 150 separate parts of speech.
– Different languages have different tag sets, with heavily inflected languages having larger tag sets.

History and Development
– Research on part-of-speech tagging has been closely tied to corpus linguistics.
– The Brown Corpus, developed in the mid-1960s, was the first major corpus of English for computer analysis.
– The Brown Corpus was painstakingly tagged with part-of-speech markers over many years.
– The corpus has been used for numerous studies and inspired the development of similar tagged corpora in other languages.
– Part-of-speech tagging was considered an inseparable part of natural language processing for a long time.
– The Brown Corpus consists of about 1,000,000 words of running English prose text.
– It was tagged with part-of-speech markers using a program and later reviewed and corrected by hand.
– The tagging of the Brown Corpus formed the basis for many later part-of-speech tagging systems.
– Larger corpora, such as the 100 million word British National Corpus, have since superseded the Brown Corpus.
– Part-of-speech tagging was considered essential in natural language processing due to the ambiguity of certain words.
– In the mid-1980s, researchers began using hidden Markov models (HMMs) to disambiguate parts of speech.
– HMMs involve counting cases and creating a table of probabilities for certain word sequences.
– HMMs were used to tag the Lancaster-Oslo-Bergen Corpus of British English.
– The use of HMMs improved part-of-speech tagging accuracy.
– HMMs reduced the need for analyzing higher levels of language understanding for each word.

Unsupervised Tagging
– Unsupervised tagging techniques use untagged corpora to derive part-of-speech categories.
– Iterative processes reveal patterns in word use and similarity classes.
– Rule-based, stochastic, and neural approaches are used in unsupervised tagging.
– Unsupervised tagging can provide valuable new insights.
– Induction-based methods can achieve accuracy above 95%.

Related Topics and References
– See also: Semantic net, sliding window based part-of-speech tagging, trigram tagger, and word sense disambiguation.
– References include POS tags in Sketch Engine and A Universal Part-of-Speech Tagset.
– Works cited: Charniak’s ‘Statistical Techniques for Natural Language Parsing’ and DeRose’s ‘Stochastic Methods for Resolution of Grammatical Category Ambiguity in Inflected and Uninflected Languages.’

In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, by a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms.

« Back to Glossary Index

Submit your RFP

We can't wait to read about your project. Use the form below to submit your RFP!

Gabrielle Buff
Gabrielle Buff

Just left us a 5 star review

Great customer service and was able to walk us through the various options available to us in a way that made sense. Would definitely recommend!

Stoute Web Solutions has been a valuable resource for our business. Their attention to detail, expertise, and willingness to help at a moment's notice make them an essential support system for us.

Paul and the team are very professional, courteous, and efficient. They always respond immediately even to my minute concerns. Also, their SEO consultation is superb. These are good people!

Paul Stoute & his team are top notch! You will not find a more honest, hard working group whose focus is the success of your business. If you’re ready to work with the best to create the best for your business, go Stoute Web Solutions; you’ll definitely be glad you did!

Wonderful people that understand our needs and make it happen!

Paul is the absolute best! Always there with solutions in high pressure situations. A steady hand; always there when needed; I would recommend Paul to anyone!

Vince Fogliani

The team over at Stoute web solutions set my business up with a fantastic new website, could not be happier

Steve Sacre

If You are looking for Website design & creativity look no further. Paul & his team are the epitome of excellence.Don't take my word just refer to my website ""that Stoute Web Solutions created.This should convince anyone that You have finally found Your perfect fit

Jamie Hill

Paul and the team at Stoute Web are amazing. They are super fast to answer questions. Super easy to work with, and knows their stuff. 10,000 stars.

Paul and the team from Stoute Web solutions are awesome to work with. They're super intuitive on what best suits your needs and the end product is even better. We will be using them exclusively for our web design and hosting.

Dean Eardley

Beautifully functional websites from professional, knowledgeable team.

Along with hosting most of my url's Paul's business has helped me with website development, graphic design and even a really cool back end database app! I highly recommend him as your 360 solution to making your business more visible in today's social media driven marketplace.

I hate dealing with domain/site hosts. After terrible service for over a decade from Dreamhost, I was desperate to find a new one. I was lucky enough to win...

Paul Stoute has been extremely helpful in helping me choose the best package to suite my needs. Any time I had a technical issue he was there to help me through it. Superb customer service at a great value. I would recommend his services to anyone that wants a hassle free and quality experience for their website needs.

Paul is the BEST! I am a current customer and happy to say he has never let me down. Always responds quickly and if he cant fix the issue right away, if available, he provides you a temporary work around while researching the correct fix! Thanks for being an honest and great company!!

Paul Stoute is absolutely wonderful. Paul always responds to my calls and emails right away. He is truly the backbone of my business. From my fantastic website to popping right up on Google when people search for me and designing my business cards, Paul has been there every step of the way. I would recommend this company to anyone.

I can't say enough great things about Green Tie Hosting. Paul was wonderful in helping me get my website up and running quickly. I have stayed with Green...