Named-entity recognition platforms and evaluation
– GATE, OpenNLP, SpaCy, Transformers, Stanford NER, and NLTK are notable NER platforms
– Precision, recall, and F1 score are commonly used measures for evaluating NER systems
– State-of-the-art NER systems for English achieve near-human performance
– The best system in MUC-7 scored 93.39% of F-measure
– Human annotators scored 97.60% and 96.95%
– NER systems have made significant advancements in accuracy and efficiency
– Different types of errors and their importance need to be considered in evaluating NER systems
Named entity types and hierarchies
– BBN categories, Sekines extended hierarchy, and Freebase entity types are proposed for named entity types
– BBN categories consist of 29 types and 64 subtypes
– Sekines extended hierarchy includes 200 subtypes
– Freebase entity types have been used for NER over social media text
– Different hierarchies help organize and classify named entities in NER systems
Approaches to Named Entity Recognition
– NER systems use linguistic grammar-based techniques and statistical models like machine learning
– Hand-crafted grammar-based systems have better precision but lower recall and require months of work
– Statistical NER systems require a large amount of manually annotated training data
– Semisupervised approaches have been suggested to reduce annotation effort
– Conditional random fields are a typical choice for machine-learned NER
Challenges, misconceptions, and research in Named Entity Recognition
– Named-entity recognition is far from being solved despite high reported F1 numbers
– Efforts are focused on reducing annotation labor through semi-supervised learning
– Robust performance across domains is a key challenge
– Scaling up to fine-grained entity types is another challenge
– Crowdsourcing is a promising solution to obtain high-quality human judgments for NER
– Researchers have compared NER performances from different statistical models and feature sets
– Graph-based semi-supervised learning models have been proposed for language-specific NER tasks
Applications and advances in Named Entity Recognition
– NER has applications in question answering, information retrieval, and open-domain search queries
– Chinese named entity recognition is relevant to language processing and intelligent information systems
– NER has been applied to Twitter messages
– Advances in NER include fine-grained recognition using conditional random fields and graph-based semi-supervised learning models
Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as this one:
Jim bought 300 shares of Acme Corp. in 2006.
And producing an annotated block of text that highlights the names of entities:
[Jim]Person bought 300 shares of [Acme Corp.]Organization in [2006]Time.
In this example, a person name consisting of one token, a two-token company name and a temporal expression have been detected and classified.
State-of-the-art NER systems for English produce near-human performance. For example, the best system entering MUC-7 scored 93.39% of F-measure while human annotators scored 97.60% and 96.95%.