Skip to main content
Glossary Term

Scraper site

Examples of scraper websites - Search engines like Google scrape content from other websites to present it to their users. - Some dating websites use scraping techniques, often combined with facial recognition. - Scraper websites are used for general image analysis and identifying images of crops with pests and diseases. - Scraper sites made for advertising are called 'Made for AdSense' sites. - Some scraper sites link to other sites to improve their search engine ranking. Made for advertising - 'Made for AdSense' sites have no value except to generate ad clicks. - These sites are considered search engine spam and dilute search results. - Some scraper sites use private blog networks to improve their ranking. - Auto blogs, a type of scraper site, were common among black-hat marketers. - Scraper sites can be used to manipulate search engine results. Legality - Scraper sites may violate copyright law, even when scraping from open content sites. - Some licenses, like GFDL and CC-BY-SA, require republishers to inform readers and give credit to the original author. - Scraping without respecting licenses is a copyright violation. - Copyright infringement can occur even when scraping from Wikipedia. - Violating copyright licenses can have legal consequences. Techniques - Different scraper sites target websites based on their objectives. - Some scraper sites target sites with large amounts of content, like airlines or department stores, to gather pricing information. - Other scraper sites pull snippets and text from high-ranking websites to improve their own search engine ranking. - RSS feeds are vulnerable to scraping. - Some scraper sites consist of advertisements and random paragraphs of words. Domain hijacking - Scraper sites may purchase recently expired domains to utilize their SEO power. - Expired domains can be used to maintain backlinks and historical ranking ability. - Spammers may match the topic or copy existing content from the Internet Archive to maintain authenticity. - Some expired domain registration agents provide services to find and gather HTML from expired domains. - Domain hijacking can be used to create new sites or power private blog networks.