content tips for your business


Introduction to SEO – Part 2 – The Anatomy of Search

What is a Search Engine?

A search engine is a sophisticated piece of software, accessed through a page on a website (such as Google, Yahoo!) that allows you to search the web by entering search queries into a search box. The search engine then tries to match your search query with the content of web pages that it has stored (or cached) and indexed on its powerful servers.

Early search engines held an index of a few hundred thousand pages and documents, and received maybe one or two thousand queries a day. Today, a top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day.

Types of Search Indexes on the Web:

Directories – Unlike search engines, which use special software to locate and index sites, directories are compiled and maintained by humans. Directories often consist of a categorised list of links to other sites to which you can add your own site. Editors sometimes review your site to see if it is fit for inclusion in the directory.

Crawler Based Search Engines – Crawler based search engines differ from directories in that they are not compiled and maintained by humans. Instead, crawler based search engines use sophisticated pieces of software called spiders or robots to search and index web pages.


  • Google is a prominent example of a crawler based search engine.
  • Some search systems are ‘hybrid’ systems that combine both forms, e.g. Yahoo! is both a directory and crawler based search engine.

worldwide searchFor our purpose  we will focus on the Google search engine.

Based on statistics gathered by (an online statistics portal), Google is by far the most used online search engine worldwide with 1.17 Billion people using Google search (dated December 2012). Yahoo! and Microsoft fall short with less than 300 million!

The Search Engine Index:

How do search engines collect information about websites? With Crawling Spiders!

To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web pages. When a spider is building its lists, the process is called web crawling.

Here is a depiction of the web crawling process by SEM Expertise (a search engine marketing agency):

Search Engine Spiders

The spider will usual start with lists of heavily used servers and very popular web pages. The spider will then begin indexing the words on the pages, and follow every link found within the page. In this way, the spider quickly begins to travel, spreading out across the most widely used portions of the Web.

When the Google spider looks at a web page, it will note:

  • The words within the page – on page information.
  • Where the words were located on the page.
  • HTML tags – off page information.

The Google spider is built to index every significant word on a page – leaving out the articles “a” “an” and “the” (other spiders take different approaches).

The Importance of Meta Tags and Keywords:

We were introduced to meta (HTML) tags, keywords and keyword phrases in Introduction to SEO – Part 1 – Anatomy of a Webpage.

Spiders crawl meta tags to determine the keywords and key phrases under which the page should be indexed.

This is why web page meta tags should always reflect the actual content on the page, and not be skewed with the use of (irrelevant) popular keywords just to try to gain a better search result ranking. To protect against this keyword stuffing, spiders will correlate meta tags with page content, and reject the meta tags that do not match the words on the page.

Google has terms and conditions under which users must abide. There are also best practices and bad practices (which can result in Google blacklisting a webpage).

In addition. the Google search algorithms (which you will learn more about shortly) are constantly evolving and trying to give users the most relevant and trustworthy search results.

The Search Engine Index:

Once the spiders have completed the task of finding information on web pages (since hundreds of new webpages are added to the internet every second, spiders are actually always crawling). The search engine must store the information in a way that makes it useful.

There are two key components involved in making the gathered data accessible to users:

  • The information stored with the data
  • The method by which the information is indexed

If a search engine just stored individual words and the URLs where they are found this would have very limited use because there would be no way of telling:

  • whether the word was used in an important or a trivial way on the page,
  • whether the word was used once or many times, or
  • whether the page contained links to other pages containing the word.

In other words, there would be no way of building the ranking list that search engines use to present the most useful pages at the top of the list of search results!

In order to create useful results, in addition to words and URLs:

  • Most search engines store the number of times that a word appears on a page.
  • The search engine might assign a weight to each entry, with increasing values assigned to words that appear near the top of the document, in sub-headings, in links, in the meta tags or in the title of the page.

All search engines use different formulas and result in different search results. (We will learn more about Google algorithms and PageRank in Introduction to SEO – Part 3).


The Introduction to SEO – Part 1 introduced you to the various components that make up a webpage, as well as the importance of keywords and keyword research. We have now looked at how search engine spiders ‘read’ webpages and subsequently ‘index’ them for later retrieval.

The next article in the series Introduction to SEO – Part 3 will look at how search engines determine how to rank results based on user queries.

Search Image courtesy of Stuart Miles at

Leave a reply

Your email address will not be published. Required fields are marked *