Overview and History of Information Retrieval
– Information retrieval process begins with a user query
– Queries are formal statements of information needs
– Multiple objects may match a query with varying relevance
– Objects can be text documents, images, audio, mind maps, or videos
– IR systems rank results based on query-object match and relevance
– Vannevar Bush popularized the idea of using computers for information retrieval in 1945
– Emanuel Goldberg filed patents for a statistical machine for document search in the 1920s and 1930s
– Holmstrom described the first computer searching for information in 1948
– Automated information retrieval systems were introduced in the 1950s
– Large-scale retrieval systems like Lockheed Dialog came into use in the 1970s
Mathematical Models in Information Retrieval
– Set-theoretic models represent documents as sets of words or phrases
– Algebraic models represent documents and queries as vectors, matrices, or tuples
– Probabilistic models treat document retrieval as probabilistic inference
– Feature-based retrieval models view documents as vectors of feature values
– Each model has different mathematical foundations and properties
Term Interdependencies in Information Retrieval Models
– Models without term-interdependencies treat terms as independent
– Models with immanent term interdependencies allow representation of interdependencies between terms
– Models with transcendent term interdependencies rely on external sources for interdependency definition
– The degree of interdependency between terms is defined by the model itself or derived from co-occurrence
– Different models have different approaches to representing term interdependencies
Performance and Correctness Measures in Information Retrieval
– Evaluation measures assess how well an IR system meets user information needs
– Traditional metrics include precision and recall for Boolean or top-k retrieval
– Ground truth notion of relevance is assumed for evaluation
– Queries may be ill-posed and relevance shades may vary
– Evaluation measures help assess the performance and correctness of an IR system
Key Contributors, Development, and Awards in Information Retrieval
– Eugene Garfield: Invented the citation index
– Calvin Mooers: Coined the term ‘information retrieval’
– Philip Bagley: Conducted the earliest experiment in computerized document retrieval
– Allen Kent: Published a paper describing precision and recall measures and proposed an evaluation framework for IR systems
– Hans Peter Luhn: Published a paper on auto-encoding of documents for information retrieval
– Gerard Salton: Began work on IR at Harvard and later moved to Cornell, outlined the vector model
– Karen Spärck Jones: Continued work on computational linguistics as it applies to IR
– Major conferences in information retrieval include SIGIR, ECIR, CIKM, WWW, and WSDM
– Awards in the field of information retrieval include the Tony Kent Strix award, Gerard Salton Award, and Karen Spärck Jones Award
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.
Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals and other documents; it also stores and manages those documents. Web search engines are the most visible IR applications.