Document retrieval

« Back to Glossary Index

Document Retrieval Systems
– Document retrieval systems match text records against user queries
– Consist of a database of documents, a classification algorithm, and a user interface
– Main tasks are finding relevant documents and evaluating matching results
Internet search engines are classical applications of document retrieval
– Range from simple Boolean systems to systems using statistical or natural language processing techniques

Indexing Schemata
– Two main classes of indexing schemata: form-based and content-based
– Form-based indexing addresses the exact syntactic properties of a text
– Content-based approach exploits semantic connections between documents and queries
– Most content-based systems use an inverted index algorithm
– Signature file is a technique that creates a quick filter for matching documents

Form-based Indexing
– Addresses the exact syntactic properties of a text
– Text is generally unstructured and not necessarily in a natural language
– Used for processing large sets of chemical representations in molecular biology
Suffix tree algorithm is an example of form-based indexing

Content-based Indexing
– Exploits semantic connections between documents and queries
– Most content-based systems use an inverted index algorithm
– Signature file is a technique for creating a quick filter
– Can beat inverted files in certain environments with proper parameters
– Involves creating a hash coded version of each file for matching

Example: PubMed
– PubMed form interface features related articles search
– Comparison of words from document title, abstract, and MeSH terms
– Uses a word-weighted algorithm for relevance ranking
– PubMed is a widely used document retrieval system
– Provides access to a vast collection of biomedical literature

Document retrieval (Wikipedia)

Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. User queries can range from multi-sentence full descriptions of an information need to a few words.

Document retrieval is sometimes referred to as, or as a branch of, text retrieval. Text retrieval is a branch of information retrieval where the information is stored primarily in the form of text. Text databases became decentralized thanks to the personal computer. Text retrieval is a critical area of study today, since it is the fundamental basis of all internet search engines.

« Back to Glossary Index

Submit your RFP

We can't wait to read about your project. Use the form below to submit your RFP!

Gabrielle Buff
Gabrielle Buff

Just left us a 5 star review

Great customer service and was able to walk us through the various options available to us in a way that made sense. Would definitely recommend!

Stoute Web Solutions has been a valuable resource for our business. Their attention to detail, expertise, and willingness to help at a moment's notice make them an essential support system for us.

Paul and the team are very professional, courteous, and efficient. They always respond immediately even to my minute concerns. Also, their SEO consultation is superb. These are good people!

Paul Stoute & his team are top notch! You will not find a more honest, hard working group whose focus is the success of your business. If you’re ready to work with the best to create the best for your business, go Stoute Web Solutions; you’ll definitely be glad you did!

Wonderful people that understand our needs and make it happen!

Paul is the absolute best! Always there with solutions in high pressure situations. A steady hand; always there when needed; I would recommend Paul to anyone!

Vince Fogliani

The team over at Stoute web solutions set my business up with a fantastic new website, could not be happier

Steve Sacre

If You are looking for Website design & creativity look no further. Paul & his team are the epitome of excellence.Don't take my word just refer to my website ""that Stoute Web Solutions created.This should convince anyone that You have finally found Your perfect fit

Jamie Hill

Paul and the team at Stoute Web are amazing. They are super fast to answer questions. Super easy to work with, and knows their stuff. 10,000 stars.

Paul and the team from Stoute Web solutions are awesome to work with. They're super intuitive on what best suits your needs and the end product is even better. We will be using them exclusively for our web design and hosting.

Dean Eardley

Beautifully functional websites from professional, knowledgeable team.

Along with hosting most of my url's Paul's business has helped me with website development, graphic design and even a really cool back end database app! I highly recommend him as your 360 solution to making your business more visible in today's social media driven marketplace.

I hate dealing with domain/site hosts. After terrible service for over a decade from Dreamhost, I was desperate to find a new one. I was lucky enough to win...

Paul Stoute has been extremely helpful in helping me choose the best package to suite my needs. Any time I had a technical issue he was there to help me through it. Superb customer service at a great value. I would recommend his services to anyone that wants a hassle free and quality experience for their website needs.

Paul is the BEST! I am a current customer and happy to say he has never let me down. Always responds quickly and if he cant fix the issue right away, if available, he provides you a temporary work around while researching the correct fix! Thanks for being an honest and great company!!

Paul Stoute is absolutely wonderful. Paul always responds to my calls and emails right away. He is truly the backbone of my business. From my fantastic website to popping right up on Google when people search for me and designing my business cards, Paul has been there every step of the way. I would recommend this company to anyone.

I can't say enough great things about Green Tie Hosting. Paul was wonderful in helping me get my website up and running quickly. I have stayed with Green...