Glossary Term

Spider trap

Definition and Purpose of Spider Traps - Spider traps are sets of web pages that can cause web crawlers to make infinite requests or crash. - They can be intentionally or unintentionally created. - Spider traps are used to catch spambots or crawlers that waste website bandwidth. - They can be unintentionally created by calendars or algorithmically generated language poetry. - There is no algorithm to detect all spider traps. Impact on Web Crawlers - Spider traps waste the resources of web crawlers. - They lower the productivity of web crawlers. - Poorly written crawlers can crash when encountering spider traps. - Polite web crawlers are affected to a lesser degree than impolite ones. - Legitimate polite bots would not fall into spider traps. Politeness and Spider Traps - Polite web crawlers alternate requests between different hosts. - Polite crawlers do not request documents from the same server too frequently. - Politeness helps reduce the impact of spider traps on web crawlers. - The use of robots.txt can prevent polite bots from falling into traps. - Impolite bots disregarding robots.txt settings are affected by traps. Related Concepts - Robots exclusion standard is related to spider traps. - Web crawlers are closely associated with spider traps. - Z39.50, Search/Retrieve Web Service, and Search/Retrieve via URL are related technologies. - OpenSearch and Representational State Transfer (REST) are relevant concepts. - Wide area information server (WAIS) is another related technology. References - Techopedia provides information on spider traps. - Neil M Hennessy's work discusses L=A=N=G=U=A=G=E poetry on the web. - Portent is a source for spider trap information. - Thesitewizard.com provides guidance on setting up robots.txt. - The DEV Community offers insights on building a polite web crawler.

Back to Glossary