Below is one of the screencasts from the recent Web Archiving Workshop. Please note there is no sound on the video. Read the synopsis below to understand what’s happening. You may also need to view the video in full screen mode.
This screencast will show some of the behaviours of The Wayback Machine at archive.org.
I’m going to the www.archive.org URL. First I’ll use the search engine on the front page, just doing a text search for my target, which is the Arts and Humanities Research Council.
Result is Not In Archive. However, Wayback Machine admits this isn’t the best way to retrieve web content. “Using the Internet Archive Wayback Machine, it is possible to search for the names of sites contained in the Archive (URLs) and to specify date ranges for your search. We hope to implement a full text search engine at some point in the future.”
Instead, let’s find it by the URL. I go back and click on the Web tab and go to the front page of the Wayback Machine. The box here allows me to enter a link in the http:// box. I enter www.ahrc.ac.uk, and click Take Me Back.
Here we see the Results of the search. What we’re seeing is a collection of dated archived sites from 2003 to 2008. The harvest of an entire site is called a ‘Page’. What this means is they did it 32 times in 2007, for example.
Let’s follow the first link for a harvest dated 04 January 2006.
Pay attention to the Address Bar at the top of the screen. Note that the URL includes a datestamp in the path, followed by the www.ahrc.ac.uk link.
I’m now just scrolling up and down to show you how their captured page has rendered.
Now I’ll compare their capture with a similar page which I captured and added to the UKWA archive. Mine is from 2005.
I’m now toggling between the two and you can see that the Wayback version’s layout isn’t the same. This is because the style sheet was not captured, or is lost, or is not rendering in some way.
Now I’m navigating within the archived version in Wayback, and going to the Links page. I’m following a Link at random and I get the result Data Retrieval Failure.
Notice another thing – we’ve now strayed slightly from 04 January 2006 and gone back to December 2005.
Now to follow another link to the Postgraduate page.
I’m now clicking on www.prospects.csu.ac.uk, a site which lies outside the AHRS domain. There will be a pause while this loads up, so feel free to take a closer look at the Address bar.
Now I’m taken outside the AHRC page I was looking at, to another site altogether. It so happens this site is also being harvested by IA. Wayback Machine is effectively making these connections within its own collection – allowing the user to browse around copies of the entire world wide web.