History and Evolution of NLP
– Natural language processing (NLP) has its roots in the 1950s.
– Alan Turing proposed the Turing test as a criterion of intelligence in 1950.
– The Georgetown experiment in 1954 involved automatic translation of Russian sentences into English.
– Symbolic NLP (1950s – early 1990s) involves emulating natural language understanding using predefined rules.
– The Chinese room experiment by John Searle demonstrates the premise of symbolic NLP.
– Progress in machine translation was limited until the late 1980s when statistical machine translation systems were developed.
– Statistical NLP (1990s-2010s) emerged in the late 1980s and mid-1990s, replacing rule-based approaches.
– Neural NLP (present) gained popularity due to its performance and applicability in language modeling.
Approaches in NLP
– Symbolic approach involves hand-coding rules for manipulating symbols and using a dictionary lookup.
– Statistical approach emerged in the late 1980s and mid-1990s, replacing rule-based approaches.
– Neural networks approach replaced the statistical approach since 2015.
– Machine learning approaches, such as statistical and neural networks, have advantages over the symbolic approach.
– Statistical and neural networks methods can focus on common cases extracted from a corpus of texts.
Common NLP Tasks
– Optical character recognition (OCR) involves determining text from an image.
– Speech recognition converts spoken language into text.
– Speech segmentation separates spoken language into words.
– Text-to-speech transforms written text into spoken language.
– Word segmentation (tokenization) separates continuous text into separate words.
NLP Applications
– Generating readable summaries of text
– Detecting and correcting grammatical errors
– Automatically translating text between human languages
– Understanding natural language and converting it into formal representations
– Generating natural language from structured information
Challenges in NLP
– Challenges in NLP include speech recognition, natural-language understanding, and natural-language generation.
– NLP aims to give computers the ability to support and manipulate human language.
– It involves processing natural language datasets using rule-based or probabilistic machine learning approaches.
– The goal is for computers to understand the contextual nuances of language and extract information from documents.
– NLP technology can categorize, organize, and extract insights from documents.
Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.
Challenges in natural language processing frequently involve speech recognition, natural-language understanding, and natural-language generation.