Glossary Term
Scraper site
Examples of scraper websites
- Search engines like Google scrape content from other websites to present it to their users.
- Some dating websites use scraping techniques, often combined with facial recognition.
- Scraper websites are used for general image analysis and identifying images of crops with pests and diseases.
- Scraper sites made for advertising are called 'Made for AdSense' sites.
- Some scraper sites link to other sites to improve their search engine ranking.
Made for advertising
- 'Made for AdSense' sites have no value except to generate ad clicks.
- These sites are considered search engine spam and dilute search results.
- Some scraper sites use private blog networks to improve their ranking.
- Auto blogs, a type of scraper site, were common among black-hat marketers.
- Scraper sites can be used to manipulate search engine results.
Legality
- Scraper sites may violate copyright law, even when scraping from open content sites.
- Some licenses, like GFDL and CC-BY-SA, require republishers to inform readers and give credit to the original author.
- Scraping without respecting licenses is a copyright violation.
- Copyright infringement can occur even when scraping from Wikipedia.
- Violating copyright licenses can have legal consequences.
Techniques
- Different scraper sites target websites based on their objectives.
- Some scraper sites target sites with large amounts of content, like airlines or department stores, to gather pricing information.
- Other scraper sites pull snippets and text from high-ranking websites to improve their own search engine ranking.
- RSS feeds are vulnerable to scraping.
- Some scraper sites consist of advertisements and random paragraphs of words.
Domain hijacking
- Scraper sites may purchase recently expired domains to utilize their SEO power.
- Expired domains can be used to maintain backlinks and historical ranking ability.
- Spammers may match the topic or copy existing content from the Internet Archive to maintain authenticity.
- Some expired domain registration agents provide services to find and gather HTML from expired domains.
- Domain hijacking can be used to create new sites or power private blog networks.