Before the Web there was Usenet: an internet-wide discussion system divided into tens of thousands of threaded conferences whose topics covered every possible subject ranging from the dry and technical to the incredibly frivolous. Today, Usenet’s text-only messages might seem hopelessly dated and limited, but from 1979 until the advent of the Web, it was the main way that internet users could share their interests with like-minded people. Some historic information was first posted on Usenet: it saw the first mention of MS-DOS and the Apple Mac, it was used by researchers to discuss their work on HIV and by engineers exploring the cause of the Challenger Space Shuttle disaster, and it was even used to announce a new technology called the World Wide Web.
Announcing the World Wide Web
On 6 August 1991, Tim Berners-Lee made a posting to Usenet announcing his work on the World Wide Web. The posting was sent to a group concerned with the development of hypertext systems hosted inside the ‘alternative’ hierarchy of Usenet – hence its name alt.hypertext. The posting is worth reading because you will recognise most of the attributes of the modern Web, although some of the terms we are familiar with (such as URL) hadn’t yet been developed.
Message from discussion WorldWideWeb: Summary
Aug 6 1991, 8:37 pm
In article <6…@cernvax.cern.ch> I promised to post a short summary of the WorldWideWeb project. Mail me with any queries.
WorldWideWeb – Executive Summary
The WWW project merges the techniques of information retrieval and hypertext to make an easy but powerful global information system.
The project started with the philosophy that much academic information should be freely available to anyone. It aims to allow information sharing within internationally dispersed teams, and the dissemination of information by support groups.
The WWW world consists of documents, and links. Indexes are special documents which, rather than being read, may be searched. The result of such a search is another (“virtual”) document containing links to the documents found. A simple protocol (“HTTP”) is used to allow a browser program to request a keyword search by a remote information server.
The web contains documents in many formats. Those documents which are hypertext, (real or virtual) contain links to other documents, or places within documents. All documents, whether real, virtual or indexes, look similar to the reader and are contained within the same addressing scheme.
To follow a link, a reader clicks with a mouse (or types in a number if he or she has no mouse). To search [an] index, a reader gives keywords (or other search criteria). These are the only operations necessary to access the entire world of data.
Information provider view
The WWW browsers can access many existing data systems via existing protocols (FTP, NNTP) or via HTTP and a gateway. In this way, the critical mass of data is quickly exceeded, and the increasing use of the system by readers and information suppliers encourage each other.
Making a web is as simple as writing a few SGML files which point to your existing data. Making it public involves running the FTP or HTTP daemon, and making at least one link into your web from another. In fact, any file available by anonymous FTP can be immediately linked into a web. The very small start-up effort is designed to allow small contributions. At the other end of the scale, large information providers may provide an HTTP server with full text or keyword indexing.
The WWW model gets over the frustrating incompatibilities of data format between suppliers and reader by allowing negotiation of format between a smart browser and a smart server. This should provide a basis for extension into multimedia, and allow those who share application standards to make full use of them across the web.
This summary does not describe the many exciting possibilities opened up by the WWW project, such as efficient document caching, the reduction of redundant out-of-date copies, and the use of knowledge daemons. There is more information in the online project documentation, including some background on hypertext and many technical notes.
The Usenet servers
ISPs and companies installed the Usenet server software on a central server that delivered conferences to individual subscribers, who read messages using a news reader application. Users could also post new articles to the Usenet conferences held on their local server. In turn, that server would push copies of new messages to nearby Usenet servers, each of which would then further propagate those messages. Over several hours or even days, messages ‘flooded’ across Usenet until every server had a copy.
Very few companies considered Usenet to be an essential service, so the Usenet servers were almost never backed up. Worse still, to free up storage space for new messages, most Usenet servers deleted old messages after a few days. It appeared that large chunks of internet history might have been lost forever.
Rescuing Usenet for posterity
Fortunately, a group of zoologists at the University of Toronto hadn’t been quite so negligent. Their department had connected to Usenet as early as 1981, when the entire Usenet system probably had only a few hundred users. Henry Spencer, a highly regarded Unix programmer, was then a member of the department and relied on the Usenet community to help solve technical problems. Spencer was loath to delete old Usenet messages, reasoning that they might contain answers to future problems he might encounter in his work, so he began to back up the Zoology department’s Usenet feed onto magnetic tape.
The department continued to back up Usenet for the next ten years, with each tape being deposited in an archive. The archiving ended only when the growth of Usenet traffic began to demand more tapes than the department’s budget could justify. The backups then found their way to the University of Western Ontario where David Wiseman, a network administrator, began to transfer some 120 MB of data from the decaying tapes onto hard disk. Wiseman eventually recovered more than two million of the very earliest Usenet messages and came to an agreement with Google over hosting them on its servers.
A second partial set of backups existed in the form of the DejaNews Research Service. DejaNews was the first searchable index of Usenet, but in 2000 it ran out of money and its archives were acquired by Google and rebranded as Google Groups. Google now owned a staggering 700 million Usenet postings, which were largely complete except for the very earliest postings – most of which could be found in Spencer and Wiseman’s archive. The combined archive is thought to represent 95% of all the messages ever posted to Usenet.