Skip to main content
Glossary Term

Enterprise search

Phases of an enterprise search system - Content goes through various phases from source repository to search results. - Content awareness is the first phase, which involves the push or pull model. - Content processing and analysis is the second phase, where different formats are processed and normalized. - Indexing is the third phase, where the processed text is stored in an optimized index. - Query processing is the fourth phase, where the user issues a query and navigational actions are considered. Content awareness - Content awareness involves either a push or pull model. - In the push model, new content is directly pushed to the search engine's APIs. - The pull model gathers content using connectors like web crawlers or database connectors. - Connectors typically poll the source at intervals to find new, updated, or deleted content. - The push model is used when real-time indexing is important. Content processing and analysis - Content from different sources may have various formats and document types. - The content processing phase converts documents to plain text using filters. - Normalization of content is often necessary to improve recall or precision. - Normalization techniques include stemming, lemmatization, synonym expansion, and entity extraction. - Tokenization is applied to split the content into basic matching units called tokens. Indexing - The resulting text is stored in an index optimized for quick lookups. - The index contains a dictionary of all unique words in the corpus. - Information about ranking and term frequency is also stored in the index. - The index does not store the full text of the documents. - Indexing enables efficient retrieval of relevant documents. Query processing and matching - Users issue queries to the system, including search terms and navigational actions. - The processed query is compared to the stored index. - The search system returns results referencing source documents that match the query. - Some systems can present the document as it was indexed. - Matching algorithms determine the relevance of the results.