Over the 18 years that I’ve been writing in this space, I’ve written 7,779 posts.
In those posts I’ve included 24,765 links to other websites; there was, indeed, a link in the very first post, to http://www.digisle.net/.
What that link has in common with a lot of other links I’ve added over the years is that it’s a broken one: that web address, which was once a link to the Hawaii-based Digital Island, is a dead one, and leads nowhere.
When a web browser requests a page from a web server, a number of things can happen: the server can report back “here you go!”, “you don’t have access to that,” or “that page no longer exists,” or “that page has moved.”
To see what had happened over the years to the pages I’d linked to from my posts, I ran the 24,765 links through a link checker, a simple command line script adapted from here, that looks like this:
xargs -n1 -P 50 curl -o /dev/null --silent --head --write-out '%{url_effective}: %{http_code}\n' < ruk-dot-ca-links.csv > stats.log
The result was a file called stats.log with one line per link and the status code that the web server returned when the script went looking for it.
Here’s a summary of what it found:
HTTP Status Code | Category | Number of Links | Per Cent |
---|---|---|---|
n/a | DNS error | 2,229 | 9% |
2xx | Success | 12,462 | 50% |
3xx | Redirection | 7,541 | 30% |
4xx | Client Errors | 2,335 | 9% |
5xx | Server Errors | 198 | 1% |
In plain language this means that:
- Half of the pages I linked to are still there, at the same address, and work as they always have.
- One third of the pages I linked to are now redirected to other addresses; sometimes this means the links still work, just with a changed address, sometimes this means the links are broken and simply redirects to the front of a website or elsewhere.
- One fifth of the pages I linked to aren’t there anymore at all, either because the server that once served them has gone offline, or because the server’s still there, but is returning an error.
Visually, this looks like this:
That half of the links are still there, at the same address, after 18 years of writing is a better result than I expected; I’d assumed that more of the web had fallen off than that.
Still, though, it’s sad that at least 20% of the pages I’ve linked to cannot be accessed directly any more. Some of those may have ended up in the Wayback Machine, others are simply lost to time.
Comments
I enjoy reading these
I enjoy reading these technical explanations because I know very little of how it all works or how to make it work. It's something I have difficulty understanding. Sharing your knowledge is helpful.
Add new comment