Phases of an enterprise search system
– Content goes through various phases from source repository to search results.
– Content awareness is the first phase, which involves the push or pull model.
– Content processing and analysis is the second phase, where different formats are processed and normalized.
– Indexing is the third phase, where the processed text is stored in an optimized index.
– Query processing is the fourth phase, where the user issues a query and navigational actions are considered.
– Content awareness involves either a push or pull model.
– In the push model, new content is directly pushed to the search engine’s APIs.
– The pull model gathers content using connectors like web crawlers or database connectors.
– Connectors typically poll the source at intervals to find new, updated, or deleted content.
– The push model is used when real-time indexing is important.
Content processing and analysis
– Content from different sources may have various formats and document types.
– The content processing phase converts documents to plain text using filters.
– Normalization of content is often necessary to improve recall or precision.
– Normalization techniques include stemming, lemmatization, synonym expansion, and entity extraction.
– Tokenization is applied to split the content into basic matching units called tokens.
– The resulting text is stored in an index optimized for quick lookups.
– The index contains a dictionary of all unique words in the corpus.
– Information about ranking and term frequency is also stored in the index.
– The index does not store the full text of the documents.
– Indexing enables efficient retrieval of relevant documents.
Query processing and matching
– Users issue queries to the system, including search terms and navigational actions.
– The processed query is compared to the stored index.
– The search system returns results referencing source documents that match the query.
– Some systems can present the document as it was indexed.
– Matching algorithms determine the relevance of the results.
"Enterprise search" is used to describe the software of search information within an enterprise (though the search function and its results may still be public). Enterprise search can be contrasted with web search, which applies search technology to documents on the open web, and desktop search, which applies search technology to the content on a single computer.
Enterprise search systems index data and documents from a variety of sources such as: file systems, intranets, document management systems, e-mail, and databases. Many enterprise search systems integrate structured and unstructured data in their collections. Enterprise search systems also use access controls to enforce a security policy on their users.
Enterprise search can be seen as a type of vertical search of an enterprise.
1912 NW 143rd Ave #24,
Portland, OR 97229, USA