You Are Here

I want to know what good is a web search engine that returns 324,909,188 ‘matches’ to my keyword. That’s like saying, Good news, we’ve located the product you’re looking for. It’s on Earth.

W. Bruce Cameron

Building a Search Engine#

We all know the basic purpose and use of an internet search engine. But can we build our own, even if rudimentary?

We can. It won’t compete with Google’s, DuckDuckGo, or Brave. But we can, with some Python, build out an engine that uses the first implementation of the Google search engine circa 1998.

We’ll roughly follow the recipe laid out by Larry Page and Sergey Brin in their 1998 paper The PageRank Citation Ranking: Bringing Order to the Web

Using a simple form of PageRank, we’ll:

  • build a crawler, i.e., a program that, using a starter page will go from page to page following anchor tags (a.k.a., links) to find related pages

  • build an index, i.e., creating a database of the documents or pages found and mapping content to their urls

  • use that index, find pages - ranked by relevance - to any given search query

../_images/searchengine.png

Fig. 1 Steps to building a search engine#