Glossary Term
Information retrieval
Overview and History of Information Retrieval
- Information retrieval process begins with a user query
- Queries are formal statements of information needs
- Multiple objects may match a query with varying relevance
- Objects can be text documents, images, audio, mind maps, or videos
- IR systems rank results based on query-object match and relevance
- Vannevar Bush popularized the idea of using computers for information retrieval in 1945
- Emanuel Goldberg filed patents for a statistical machine for document search in the 1920s and 1930s
- Holmstrom described the first computer searching for information in 1948
- Automated information retrieval systems were introduced in the 1950s
- Large-scale retrieval systems like Lockheed Dialog came into use in the 1970s
Mathematical Models in Information Retrieval
- Set-theoretic models represent documents as sets of words or phrases
- Algebraic models represent documents and queries as vectors, matrices, or tuples
- Probabilistic models treat document retrieval as probabilistic inference
- Feature-based retrieval models view documents as vectors of feature values
- Each model has different mathematical foundations and properties
Term Interdependencies in Information Retrieval Models
- Models without term-interdependencies treat terms as independent
- Models with immanent term interdependencies allow representation of interdependencies between terms
- Models with transcendent term interdependencies rely on external sources for interdependency definition
- The degree of interdependency between terms is defined by the model itself or derived from co-occurrence
- Different models have different approaches to representing term interdependencies
Performance and Correctness Measures in Information Retrieval
- Evaluation measures assess how well an IR system meets user information needs
- Traditional metrics include precision and recall for Boolean or top-k retrieval
- Ground truth notion of relevance is assumed for evaluation
- Queries may be ill-posed and relevance shades may vary
- Evaluation measures help assess the performance and correctness of an IR system
Key Contributors, Development, and Awards in Information Retrieval
- Eugene Garfield: Invented the citation index
- Calvin Mooers: Coined the term 'information retrieval'
- Philip Bagley: Conducted the earliest experiment in computerized document retrieval
- Allen Kent: Published a paper describing precision and recall measures and proposed an evaluation framework for IR systems
- Hans Peter Luhn: Published a paper on auto-encoding of documents for information retrieval
- Gerard Salton: Began work on IR at Harvard and later moved to Cornell, outlined the vector model
- Karen Spärck Jones: Continued work on computational linguistics as it applies to IR
- Major conferences in information retrieval include SIGIR, ECIR, CIKM, WWW, and WSDM
- Awards in the field of information retrieval include the Tony Kent Strix award, Gerard Salton Award, and Karen Spärck Jones Award