It is currently some time in the 23rd century, and a scholar of the future wants to understand what was happening in 2024 in, say, Gaza, Ukraine, or Beijing. Surely she’ll be able to find what she needs — it’ll all be online right?
We are used to thinking that the internet is forever. Sometimes this can seem like a bad thing. Every dumb remark, ill-conceived costume, or bad hot take will be fixed indefinitely in the digital firmament, waiting to be dug up as a cancellable offense. But it’s also a good thing: Every atrocity, corruption scandal, transformative artwork, or major scientific discovery will also be there — forever.
The trouble is -- that's not true. The internet is not forever. In fact, in many cases it isn’t even for 100 days, the average length of time before content is changed on a webpage.
Just how ephemeral is the internet? A recent Pew study found that nearly 40% of web pages viewable in 2013 simply do not exist any more. They’ve evaporated into the digital ether, either because their owners ran out of the money or the interest they needed to maintain them.
Meanwhile, hyperlinks, an essential part of the digital information experience, are also famously flighty. A Harvard study of more than 2.5 million links from New York Times articles published between 1996 and 2020 found that at least a quarter of them were dead. 404. RIP.
To be fair, none of this is necessarily terrible.
There is plenty of crap on the internet. A lot of blogs just aren’t good. And this video of Donald Trump as the Ramones, or this one in which two cats argue about a broken ice cream machine at a McDonalds drive-thru aren’t necessarily essential texts of our time (though to be fair, I am on the fence about the cat one because it’s pretty amazing.)
But this Great Digital Transience doesn’t just affect cat videos or bad blogs. It can also affect public records and, importantly, journalism.
A 2021 report by the Donald W. Reynolds Journalism Institute at the University of Missouri showed that out of two dozen major newspapers, only seven were fully archiving their material, and much of that was only final text, rather than all digital content that was part of each article. And that, of course, is for newspapers that are still in business. Many – in fact an increasing number, as we know – are not. When they go belly up, their digital content often vanishes.
I have some personal experience with this. If you go looking for any of the dozens of articles and profiles that I wrote for FT Tilt, a special project of the Financial Times, in Brazil in 2011, you will find nothing. Nada. When the FT cut the project in 2011, they also scrapped the website. Everything we had investigated, documented, or written – gone. Alas, my findings will be of no use to that future scholar passionately interested in Brazil’s early 20th century “deindustrialization.”
Failing to adequately archive our material is only one problem. Another is that the platforms where we do store much of our content are highly concentrated in the hands of a few powerful companies and countries.
Consider the fact that three vendors – Amazon, Microsoft, and Google – account for two thirds of all cloud storage. That concentration creates efficiency, sure, but also huge risks. What happens if any of those companies goes out of business, is attacked, or goes rogue? A lot could happen between now and the 23rd century.
For an extreme version of the risk here, look at China. There, the Communist Party all but owns the Internet — and as we speak, whole swaths of history are being erased.
There have been efforts to address this problem of internet impermanence. The Internet Archive, with its popular Wayback Machine, is a heroic, decades-old project that aims to copy and store every single web page that has ever been created. It works with governments and media to record particularly important documents for the historical record.
But even that database isn’t capturing everything. And in the end, it too is just another website, beholden to the vagaries of money, space, and electricity like all the others.
To be clear, I am no luddite. Having the internet and digital media — which more or less make the sum total of human knowledge instantly available to… anybody — is way better than not having it.
The problem is that having it depends on keeping it. And that means preserving it in formats that can be flighty, easily changed, or swiftly erased. In a world of polarization, cratering trust, and open lies, it is more essential than ever to care for the drafts of history that we are writing.
I don’t know if this idea or this article will still exist in 2224.
I’d like to think it will. But just in case — print out a hard copy.