Skip to main content
Glossary Term

Information retrieval

Overview and History of Information Retrieval - Information retrieval process begins with a user query - Queries are formal statements of information needs - Multiple objects may match a query with varying relevance - Objects can be text documents, images, audio, mind maps, or videos - IR systems rank results based on query-object match and relevance - Vannevar Bush popularized the idea of using computers for information retrieval in 1945 - Emanuel Goldberg filed patents for a statistical machine for document search in the 1920s and 1930s - Holmstrom described the first computer searching for information in 1948 - Automated information retrieval systems were introduced in the 1950s - Large-scale retrieval systems like Lockheed Dialog came into use in the 1970s Mathematical Models in Information Retrieval - Set-theoretic models represent documents as sets of words or phrases - Algebraic models represent documents and queries as vectors, matrices, or tuples - Probabilistic models treat document retrieval as probabilistic inference - Feature-based retrieval models view documents as vectors of feature values - Each model has different mathematical foundations and properties Term Interdependencies in Information Retrieval Models - Models without term-interdependencies treat terms as independent - Models with immanent term interdependencies allow representation of interdependencies between terms - Models with transcendent term interdependencies rely on external sources for interdependency definition - The degree of interdependency between terms is defined by the model itself or derived from co-occurrence - Different models have different approaches to representing term interdependencies Performance and Correctness Measures in Information Retrieval - Evaluation measures assess how well an IR system meets user information needs - Traditional metrics include precision and recall for Boolean or top-k retrieval - Ground truth notion of relevance is assumed for evaluation - Queries may be ill-posed and relevance shades may vary - Evaluation measures help assess the performance and correctness of an IR system Key Contributors, Development, and Awards in Information Retrieval - Eugene Garfield: Invented the citation index - Calvin Mooers: Coined the term 'information retrieval' - Philip Bagley: Conducted the earliest experiment in computerized document retrieval - Allen Kent: Published a paper describing precision and recall measures and proposed an evaluation framework for IR systems - Hans Peter Luhn: Published a paper on auto-encoding of documents for information retrieval - Gerard Salton: Began work on IR at Harvard and later moved to Cornell, outlined the vector model - Karen Spärck Jones: Continued work on computational linguistics as it applies to IR - Major conferences in information retrieval include SIGIR, ECIR, CIKM, WWW, and WSDM - Awards in the field of information retrieval include the Tony Kent Strix award, Gerard Salton Award, and Karen Spärck Jones Award