Digital Librarian for and of the World

Ferose V R
6 min readJun 14, 2023

--

“Pick a BIG project that you won’t finish, that lasts beyond your lifetime.” Brewster Kahle expanded on the thought with a reference to Herman Melville’s Moby Dick: “Because if you achieve your goal, you may go down with it.” That statement remained with me since my meeting with him at the Internet Archive on 18 January 2023.

Brewster Kahle is a Silicon Valley maverick, a man on a mission, and founder of the non-profit Internet Archive. Inducted into the Internet Hall of Fame, he is an original and uncannily insightful deep thinker. Geeky and bespectacled, he never hesitates to speak his mind. It is not surprising that he is a voracious reader, loves books and wants to save them for generations.

Situated within a historic church in the Richmond District in San Francisco, the Internet Archive office may seem like yet another eccentric whim of a Silicon Valley native. Brewster chose the place because it resembles their logo — pillars that look like the Library of Alexandria, which was destroyed by fire in 48 BCE. As you walk inside the church, you see massive servers on the wall — the blinking lights an indication of data that is being accessed from their website (https://archive.org/ is one of the 100 most accessed websites on the internet). Unlike in other places, he wants to publicly display the servers to make people aware of what the company is building. Then, along the sides of the church, you see hundreds of three-foot-high statues. Inspired by the Terracotta Warriors of Xian China, Brewster started the tradition of making a terracotta statue of every employee who completes three years working at the Internet Archive.

It all started back in 1980, when Brewster was asked about the positive side of technology. He had two ideas in mind — protect people’s privacy and build a digital library for all. He has been working on the second idea since 1996, when he started the Internet Archive. Simply put, the Internet Archive is the Library of the Internet, on the Internet. Its mission is “Universal access of ALL knowledge”. It is a bold and audacious goal but what he has achieved so far is nothing short of a miracle.

While the project is far from complete, the numbers are staggering. His team of 25 people has archived 790k software titles (e.g. play games from Apple 2), 5M moving images, 14M audio recordings (including radio recording) and 2.2M TV programs (you can borrow the footage, take the rights and make your own documentary). They have digitized 6M e-books from libraries (which include 3M books published before 1925). He has made the content available for the blind and dyslexic. Journalists can use the internet archive to do data mining for their research. The Wayback machine has archived web pages from 1996 — a staggering 625,000,000,000 pages. What I was not aware is that the average life of a website is only 100 days. What if you wanted to look at what a page looked like a decade ago? Internet Archive Website lets you go back in time to see what websites looked like in the past or what data was displayed on a website before the current version.

To make all this happen, the archive has storage space of a staggering 100 peta bytes! Its annual budget of $25M is a fraction of the cost of running the San Francisco Public Library, which has a $180M budget.

Many of the innovations are in-house. The scanning machine that requires the manual turning of every page is designed in such a way that it does high-quality scanning of books without damaging them. With 35 scanning centers around the world, they are able to scan more than 1000 books a day. When I gave my book The Invisible Majority to Brewster Kahle, I wanted to check if it was in the archive.org website. It was not! But, by the time I had reached home after meeting Brewster, he had sent me the link to my book on archive.org. The archive has backup centers — one in Alexandria (no coincidence there!), and the others in Canada and the Netherlands. His goal is to build the next generation web where people can make money publishing on the internet without being controlled by tech or publishing companies.

When the pandemic hit and libraries went into lockdown overnight, journalists lost their ability to research topics. One such journalist was Pulitzer Prize winner John Markoff, who was working on his book The Whole Earth. What came to his rescue was the Internet Archive and its ability to research over thousands of books and texts online and offer their digital versions on loan.

It is the apocalyptic scenario Brewster had designed the Internet Archive for, and it played out as intended. Like the TCP/IP protocol, if you didn’t know where to look, you couldn’t tell anything had changed. But for people like John Markoff, everything had. Imagine if he had to wade through paywalls or find an alternate store of 36 million books! Only the Library of Congress with 173 million items has more, and it was under lockdown.

Can we imagine a world where access to knowledge is no longer free and no longer immutable? Where pages are metered, totally controlled by corporate enterprises, and only available through platforms of their choosing. Where the enterprise would decide whether you could purchase or read a certain book. A world where books are no longer immutable, and pages change, even as you turn them, based on what the enterprise considers ethical or unethical, fact or fiction. Where the enterprise has total control over the versions, and you only have one version — the latest.

Today if you buy a physical book, you own it; once it is sold to you it cannot be tampered with. By comparison whether you buy or rent a digital product, the medium is entirely controlled by the enterprise and their specific digital rights management philosophy. When you buy an e-book you own nothing except the device you consume it on (e.g. Kindle from Amazon). Is this a good or a bad thing? It’s good so long as you trust the enterprise. But if you don’t, your best bet then is to convert the digital book back into physical and take control of the medium as Brewster and his friends have done and, predictably, gotten into trouble for. He has been facing federal trials and fighting court cases by mega publishers (Harper, Hachette, Penguin and Wiley). He believes there should be more publishers, not less.

Publishers are now taking the Internet to court and digital ownership is under attack. Libraries were designed to buy, preserve and lend books. In the digital world, however, you are only allowed to license them. We often forget that libraries protect privacy. No one knows who is reading which book. But in the digital world every page is tracked.

It would take some time to comprehend the enormity of what Brewster Kahle has achieved. His work and legacy reminds us that when we do something for everyone, with the right intentions, it can set into motion unexpected and long-range benefits for society.

Anyone can fix a problem after it is broken, but it takes a genius to build something before it is broken. The Internet Archive is the moonshot project to prevent another ‘Alexandria’ from happening, but in the digital world. And all it took was one unreasonable man!

--

--

Ferose V R
Ferose V R

Written by Ferose V R

Senior Vice President and Head of SAP Academy for Engineering. Inclusion Evangelist, Thought Leader, Speaker, Columnist and Author.