Web archiving

« Back to Glossary Index

History and Development of Web Archiving
– The Internet Archive, founded in 1996, was one of the first large-scale web archiving projects.
– The Internet Archive released the Wayback Machine in 2001, a search engine for viewing archived web content.
– As of 2018, the Internet Archive stored 40 petabytes of data.
– Other web archiving projects launched around the same time include the National Library of Canada’s project, Australia’s Pandora, Tasmanian web archives, and Sweden’s Kulturarw3.
– The International Web Archiving Workshop (IWAW) provided a platform for sharing experiences and ideas from 2001 to 2010.

Methods of Collection for Web Archiving
– Web archivists collect various types of web content, including HTML web pages, style sheets, JavaScript, images, and video.
Metadata about the collected resources, such as access time and content length, is archived to establish authenticity and provenance.
– Web crawlers, such as Heritrix, HTTrack, and Wget, are commonly used to automate the collection process.
– Services like the Wayback Machine and WebCite offer on-demand web archiving through web crawling techniques.
– Database archiving involves extracting the content of database-driven websites into a standard schema, allowing multiple databases to be accessed using a single system.

Remote Harvesting for Web Archiving
– Web crawlers are commonly used for remote harvesting of web content.
– Web crawlers access web pages similarly to how users with browsers view the web.
– Examples of web crawlers used for web archiving include Heritrix, HTTrack, and Wget.
– The Wayback Machine and WebCite are free services that use web crawling techniques to archive web resources.
– Remote harvesting provides a simple method for collecting web content.

Database Archiving for Web Archiving
– Database archiving involves extracting database content into a standard schema, often using XML.
– Tools like DeepArc and Xinq enable the archiving and online delivery of database content.
– DeepArc maps a relational database to an XML schema, exporting the content into an XML document.
– Xinq allows basic querying and retrieval functionality for the archived content.
– While the original layout and behavior may not be preserved, database archiving facilitates access to multiple databases through a single system.

Transactional Archiving for Web Archiving
– Transactional archiving collects actual transactions between web servers and web browsers.
– It is used to preserve evidence of viewed content for legal or regulatory compliance.
– A transactional archiving system intercepts HTTP requests and responses, filtering duplicates, and storing responses as bitstreams.
– It captures the content viewed on a particular website on a given date.
– Transactional archiving is important for organizations that need to disclose and retain information.

Web archiving (Wikipedia)

Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated capture due to the massive size and amount of information on the Web. The largest web archiving organization based on a bulk crawling approach is the Wayback Machine, which strives to maintain an archive of the entire Web.

The growing portion of human culture created and recorded on the web makes it inevitable that more and more libraries and archives will have to face the challenges of web archiving. National libraries, national archives and various consortia of organizations are also involved in archiving culturally important Web content.

Commercial web archiving software and services are also available to organizations who need to archive their own web content for corporate heritage, regulatory, or legal purposes.

« Back to Glossary Index

Submit your RFP

We can't wait to read about your project. Use the form below to submit your RFP!

Gabrielle Buff
Gabrielle Buff

Just left us a 5 star review

google

Great customer service and was able to walk us through the various options available to us in a way that made sense. Would definitely recommend!

google

Stoute Web Solutions has been a valuable resource for our business. Their attention to detail, expertise, and willingness to help at a moment's notice make them an essential support system for us.

google

Paul and the team are very professional, courteous, and efficient. They always respond immediately even to my minute concerns. Also, their SEO consultation is superb. These are good people!

google

Paul Stoute & his team are top notch! You will not find a more honest, hard working group whose focus is the success of your business. If you’re ready to work with the best to create the best for your business, go Stoute Web Solutions; you’ll definitely be glad you did!

google

Wonderful people that understand our needs and make it happen!

google

Paul is the absolute best! Always there with solutions in high pressure situations. A steady hand; always there when needed; I would recommend Paul to anyone!

facebook
Vince Fogliani
recommends

The team over at Stoute web solutions set my business up with a fantastic new website, could not be happier

facebook
Steve Sacre
recommends

If You are looking for Website design & creativity look no further. Paul & his team are the epitome of excellence.Don't take my word just refer to my website "stevestours.net"that Stoute Web Solutions created.This should convince anyone that You have finally found Your perfect fit

facebook
Jamie Hill
recommends

Paul and the team at Stoute Web are amazing. They are super fast to answer questions. Super easy to work with, and knows their stuff. 10,000 stars.

facebook

Paul and the team from Stoute Web solutions are awesome to work with. They're super intuitive on what best suits your needs and the end product is even better. We will be using them exclusively for our web design and hosting.

facebook
Dean Eardley
recommends

Beautifully functional websites from professional, knowledgeable team.

google

Along with hosting most of my url's Paul's business has helped me with website development, graphic design and even a really cool back end database app! I highly recommend him as your 360 solution to making your business more visible in today's social media driven marketplace.

yelp

I hate dealing with domain/site hosts. After terrible service for over a decade from Dreamhost, I was desperate to find a new one. I was lucky enough to win...

google

Paul Stoute has been extremely helpful in helping me choose the best package to suite my needs. Any time I had a technical issue he was there to help me through it. Superb customer service at a great value. I would recommend his services to anyone that wants a hassle free and quality experience for their website needs.

google

Paul is the BEST! I am a current customer and happy to say he has never let me down. Always responds quickly and if he cant fix the issue right away, if available, he provides you a temporary work around while researching the correct fix! Thanks for being an honest and great company!!

google

Paul Stoute is absolutely wonderful. Paul always responds to my calls and emails right away. He is truly the backbone of my business. From my fantastic website to popping right up on Google when people search for me and designing my business cards, Paul has been there every step of the way. I would recommend this company to anyone.

yelp

I can't say enough great things about Green Tie Hosting. Paul was wonderful in helping me get my website up and running quickly. I have stayed with Green...