Applications of Inverted Index
– Inverted index is a central component of search engine indexing algorithms.
– It allows for fast full-text searches.
– It optimizes query speed by listing the documents per word.
– Inverted index is used in bioinformatics for DNA sequence assembly.
– It is used to search for fragments of sequenced DNA against a reference DNA sequence.
Compression Techniques for Inverted Index
– Inverted list compression and bitmap compression solve the same problem.
– Initially developed as separate lines of research.
– Both methods are used for compressing inverted indexes.
– Compression reduces storage requirements.
– Bitmap compression and inverted list compression are related techniques.
Related Concepts to Inverted Index
– Index (search engine) is related to inverted index.
– Reverse index is another type of index.
– Vector space model is used in information retrieval.
– These concepts are related to inverted index.
– They are important in search engine technology.
References
– Knuth, D. E. (1997) [1973]. ‘Retrieval on Secondary Keys’ in The Art of Computer Programming.
– Salton, Gerard; Fox, Edward A.; Wu, Harry (November 1983). ‘Extended Boolean information retrieval’ in Communications of the ACM.
– Zobel, Justin; Moffat, Alistair; Ramamohanarao, Kotagiri (December 1998). ‘Inverted files versus signature files for text indexing’ in ACM Transactions on Database Systems.
– Baeza-Yates, Ricardo; Ribeiro-Neto, Berthier (1999). Modern information retrieval.
– Zobel, Justin; Moffat, Alistair (July 2006). ‘Inverted Files for Text Search Engines’ in ACM Computing Surveys.
External Resources
– NISTs Dictionary of Algorithms and Data Structures: inverted index.
– Managing Gigabytes for Java: a free full-text search engine for large document collections written in Java.
– Lucene: a full-featured text search engine library written in Java.
– Sphinx Search: an open-source high-performance text search engine library employing an inverted index.
– Example implementations on Rosetta Code.
– Caltech Large Scale Image Search Toolbox: a Matlab toolbox implementing Inverted File Bag-of-Words image search.
In computer science, an inverted index (also referred to as a postings list, postings file, or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content). The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database. The inverted file may be the database file itself, rather than its index. It is the most popular data structure used in document retrieval systems, used on a large scale for example in search engines. Additionally, several significant general-purpose mainframe-based database management systems have used inverted list architectures, including ADABAS, DATACOM/DB, and Model 204.
There are two main variants of inverted indexes: A record-level inverted index (or inverted file index or just inverted file) contains a list of references to documents for each word. A word-level inverted index (or full inverted index or inverted list) additionally contains the positions of each word within a document. The latter form offers more functionality (like phrase searches), but needs more processing power and space to be created.