Spider trap

« Back to Glossary Index

Definition and Purpose of Spider Traps
– Spider traps are sets of web pages that can cause web crawlers to make infinite requests or crash.
– They can be intentionally or unintentionally created.
– Spider traps are used to catch spambots or crawlers that waste website bandwidth.
– They can be unintentionally created by calendars or algorithmically generated language poetry.
– There is no algorithm to detect all spider traps.

Impact on Web Crawlers
– Spider traps waste the resources of web crawlers.
– They lower the productivity of web crawlers.
– Poorly written crawlers can crash when encountering spider traps.
– Polite web crawlers are affected to a lesser degree than impolite ones.
– Legitimate polite bots would not fall into spider traps.

Politeness and Spider Traps
– Polite web crawlers alternate requests between different hosts.
– Polite crawlers do not request documents from the same server too frequently.
– Politeness helps reduce the impact of spider traps on web crawlers.
– The use of robots.txt can prevent polite bots from falling into traps.
– Impolite bots disregarding robots.txt settings are affected by traps.

Related Concepts
– Robots exclusion standard is related to spider traps.
– Web crawlers are closely associated with spider traps.
Z39.50, Search/Retrieve Web Service, and Search/Retrieve via URL are related technologies.
OpenSearch and Representational State Transfer (REST) are relevant concepts.
Wide area information server (WAIS) is another related technology.

References
– Techopedia provides information on spider traps.
– Neil M Hennessy’s work discusses L=A=N=G=U=A=G=E poetry on the web.
– Portent is a source for spider trap information.
– Thesitewizard.com provides guidance on setting up robots.txt.
– The DEV Community offers insights on building a polite web crawler.

Spider trap (Wikipedia)

A spider trap (or crawler trap) is a set of web pages that may intentionally or unintentionally be used to cause a web crawler or search bot to make an infinite number of requests or cause a poorly constructed crawler to crash. Web crawlers are also called web spiders, from which the name is derived. Spider traps may be created to "catch" spambots or other crawlers that waste a website's bandwidth. They may also be created unintentionally by calendars that use dynamic pages with links that continually point to the next day or year.

Common techniques used are:

  • creation of indefinitely deep directory structures like http://example.com/bar/foo/bar/foo/bar/foo/bar/...
  • Dynamic pages that produce an unbounded number of documents for a web crawler to follow. Examples include calendars and algorithmically generated language poetry.
  • documents filled with many characters, crashing the lexical analyzer parsing the document.
  • documents with session-id's based on required cookies.

There is no algorithm to detect all spider traps. Some classes of traps can be detected automatically, but new, unrecognized traps arise quickly.

« Back to Glossary Index

Submit your RFP

We can't wait to read about your project. Use the form below to submit your RFP!

Gabrielle Buff
Gabrielle Buff

Just left us a 5 star review

Great customer service and was able to walk us through the various options available to us in a way that made sense. Would definitely recommend!

Stoute Web Solutions has been a valuable resource for our business. Their attention to detail, expertise, and willingness to help at a moment's notice make them an essential support system for us.

Paul and the team are very professional, courteous, and efficient. They always respond immediately even to my minute concerns. Also, their SEO consultation is superb. These are good people!

Paul Stoute & his team are top notch! You will not find a more honest, hard working group whose focus is the success of your business. If you’re ready to work with the best to create the best for your business, go Stoute Web Solutions; you’ll definitely be glad you did!

Wonderful people that understand our needs and make it happen!

Paul is the absolute best! Always there with solutions in high pressure situations. A steady hand; always there when needed; I would recommend Paul to anyone!

facebook
Vince Fogliani
recommends

The team over at Stoute web solutions set my business up with a fantastic new website, could not be happier

facebook
Steve Sacre
recommends

If You are looking for Website design & creativity look no further. Paul & his team are the epitome of excellence.Don't take my word just refer to my website "stevestours.net"that Stoute Web Solutions created.This should convince anyone that You have finally found Your perfect fit

facebook
Jamie Hill
recommends

Paul and the team at Stoute Web are amazing. They are super fast to answer questions. Super easy to work with, and knows their stuff. 10,000 stars.

Paul and the team from Stoute Web solutions are awesome to work with. They're super intuitive on what best suits your needs and the end product is even better. We will be using them exclusively for our web design and hosting.

facebook
Dean Eardley
recommends

Beautifully functional websites from professional, knowledgeable team.

Along with hosting most of my url's Paul's business has helped me with website development, graphic design and even a really cool back end database app! I highly recommend him as your 360 solution to making your business more visible in today's social media driven marketplace.

I hate dealing with domain/site hosts. After terrible service for over a decade from Dreamhost, I was desperate to find a new one. I was lucky enough to win...

Paul Stoute has been extremely helpful in helping me choose the best package to suite my needs. Any time I had a technical issue he was there to help me through it. Superb customer service at a great value. I would recommend his services to anyone that wants a hassle free and quality experience for their website needs.

Paul is the BEST! I am a current customer and happy to say he has never let me down. Always responds quickly and if he cant fix the issue right away, if available, he provides you a temporary work around while researching the correct fix! Thanks for being an honest and great company!!

Paul Stoute is absolutely wonderful. Paul always responds to my calls and emails right away. He is truly the backbone of my business. From my fantastic website to popping right up on Google when people search for me and designing my business cards, Paul has been there every step of the way. I would recommend this company to anyone.

I can't say enough great things about Green Tie Hosting. Paul was wonderful in helping me get my website up and running quickly. I have stayed with Green...