Search 101: a brief history of search

By Greg Emin • Dec 1st, 2008 • Category: Research

The Riverdale High gang

While the concept of search (organizing and cataloguing information in a manageable format) is ancient, search, in the context of an online environment, began in the early 1990s. During this time most web sites resided on university servers and were simply directories where files were stored, as most information shared by people was via FTP (File Transfer Protocol) sites. As the amount of information grew and became more complex, so did the need to index these sites, files and folders into manageable formats.

Archie (short for Archives) was the first program developed to assist in the process of finding and retrieving files based on a user’s search criteria. It was developed by Alan Emtage, a student at McGill University. The program helped users find public domain documents, so long as the user knew the name of the file they were looking for.

On Archie’s coat tails came the launch of Gopher, a search and find network protocol. The protocol indexed information based on text and directories located on servers. Gopher, along with Jughead and Veronica (which were renditions of Archie), would search plain text files and indexed servers. Both Jughead and Veronica were search engines for the Gopher protocol.

Spiders, bugs and bots

The development of search engine spiders really kick-started the search revolution. Essentially, spiders (also called “bots” and “crawlers”) are programs that search internet pages and links for relevant content based on a consumer’s search criteria / query.

1993 saw several key firsts for search engines. The first spider (The Wanderer), developed by Mathew Gray, launched in June 1993. Its purpose was to collect the number of active public web servers. By December 1993 three full-fledged search engine spiders came into existence:

  1. JumpStation – collected information from the title and header of pages. But as the number of sites continued to grow Jumpstation came to a stop. It couldn’t keep up with the growth of the web as it listed information in a chronological format.
  2. World Wide Web Worm – collected titles and URLs, and faced the same demise as JumpStation.
  3. Repository-Based Software Engineering (RBSE) spider - collected URLs and text information within a website, while ranking the relevant information.

The launch of Excite (developed by six Stanford University graduates) made a big splash. Excite was based on the notion of using statistical analysis of word relationships to make searching more efficient. Due to its success Excite merged with @Home (a broadband provider), but eventually filed for Chapter 11 bankruptcy in June, 2001.

Into the new millenium and beyond

1994 witnessed the launch of a slew of search engine programs, including:

  • Yahoo!Directory, which was based on David Filo’s and Jerry Yang’s desire to collect all of their favorite websites. Ultimately this collection morphed into a searchable directory due to the growth of the world wide web.
  • WebCrawler, which was released by Brian Pinkerton of the University of Washington and was the first crawler that indexed entire pages.
  • Lycos, a catalog of documents that allowed users to obtain information based on ranked relevance, as well as prefix matching and word proximity bonuses. Ultimately its size grew exponentially and by 1996, Lycos had indexed over 60 million documents, making it the number 1 search engine.

Throughout the growth of the dot.com bubble many other search engines entered the spot light including Inktomi, Altavista, Ask Jeeves (Ask.com), AllTheWeb, Magellan, and Infoseek. During this time Google founders Larry Page and Sergey Brin started collaborating at Stanford University. By December 1998 an incorporated Google Inc. tasted the first fruits of its labour by being recognized in PC magazine as one of the Top 100 websites.

Since the turn of the millennium three prominent players have emerged in the North American search engine market:

  1. Google
  2. Yahoo (utilizing technologies from Inktomi, Altavista, and Overture) 
  3. Live.com (aka MSN Search)

In future posts we will focus on the successes and challenges these players have faced in the past four years or so. 

Tagged as: , , , , , , ,

One Response »

  1. Hey Greg, great summary of a very large, involved subject “History of Search”. Interestingly enough, Canadian companies were (and in many ways continue to be) on the forefront of the Search industry. We would be remiss if we did not mention the significance of Gaston Gonnet’s contribution for the first indexing of the Oxford English Dictionary. (1)

    Why is this relevant? Well that same technology was “put-to-the-Web” to create OpenText. At one point in time, OpenText’s search engine and Web crawlers were the backbone to Yahoo’s search engine. I even remember their (OpenText’s) jubilation when they claimed to have indexed every piece of content on the Web (around 1995, I believe.)

    Maplesoft (as in Waterloo Maple - a company focused on math software) was long rumoured as providing the foundational algorithms to many search engines (but sadly I can’t find a reference on the Web to back this up.) Finally, Canada is hope of Idee whose image search technology will revolutionalize how content owners manage their media assets on the Web (and consumers like us can find what we are looking for in terms of a useful image search.) Check out their consumer site http://www.tineye.com to “see” what I mean.

    Greg, I realize that your intent was not to do an exhaustive review of the history of search on a blog posting. I just wanted to share some additional, relevant tidbits to Canada’s leadership role in this very important industry. Thanks for your posting, feel free to follow up if you’d like.

    Jonathan

    (1) http://en.wikipedia.org/wiki/Gaston_Gonnet

Leave a Reply