Skip to main content
Glossary Term

Full-text search

Full-text search and indexing - Full-text search is divided into indexing and searching when dealing with a large number of documents or substantial search queries. - The indexing stage scans the text of all documents and builds a list of search terms (index). - Stop words, common and meaningless words, are ignored during indexing. - Language-specific stemming is used to record words with similar concepts under a single index entry. Precision vs. recall tradeoff - Recall measures the quantity of relevant results returned by a search, while precision measures the quality of the results. - Low-precision, low-recall search results in a small number of relevant results returned. - Full-text search systems use options like stop words and stemming to increase precision and recall. - Controlled-vocabulary searching helps eliminate ambiguities and improve precision. - There is a trade-off between precision and recall: increasing precision may lower recall and vice versa. False-positive problem - Full-text searching often retrieves irrelevant documents, called false positives. - False positives are caused by the inherent ambiguity of natural language. - Clustering techniques based on Bayesian algorithms can reduce false positives. - Clustering categorizes documents based on relevant words, improving search results. - This technique is extensively used in the e-discovery domain. Performance improvements and improved querying tools - Full text searching deficiencies are addressed by providing users with improved querying tools. - Keywords improve recall by including synonyms of words that describe the subject. - Field-restricted search limits searches to a specific field within a data record. - Boolean queries using operators like AND, NOT, and OR increase precision. - Phrase search matches documents containing a specified phrase. - Concept search matches multi-word concepts, such as compound term processing. - Concordance search produces an alphabetical list of principal words with their context. - Proximity search matches documents with words separated by a specified number of words. - Regular expression employs a complex querying syntax for precise retrieval conditions. - Fuzzy search looks for documents that match given terms with some variation around them. Software and references - Thunderstone Software LLC - Vespa - Vivísimo - - In practice, it may be difficult to determine how a given search engine works. - The search algorithms employed by web-search services are seldom fully disclosed. - Capabilities of Full Text Search System (Archived from the original on December 23, 2010) - Coles, Michael (2008). Pro Full-Text Search in SQL Server 2008 (Version 1ed.). Apress Publishing Company. ISBN978-1-4302-1594-3. - B., Yuwono; Lee, D. L. (1996). Search and ranking algorithms for locating resources on the World Wide Web. 12th International Conference on Data Engineering (ICDE96). p.164.