Archive for November, 2006

Search Engines 101 - Indexing (Part 2)

(Continued from Part 1)

As connection speeds increased and bandwidth and storage became more affordable, search engines were able to visit more pages on a site and record more information about each page. In addition, search engines began to move away from considering only on-page text and put more weight on off-page factors like inbound links that are not as easily manipulated by page owners.

The important thing to remember about search engines today is their continuing reliance on off-page factors to determine what a page is about. In the past, indexing the content of the page itself was enough to provide accurate data but today there are simply too many ways for site owners to manipulate the text on their pages to artificially boost their rankings.

This continues to be one of the biggest misconceptions our clients have about search engines - that simply changing ‘META’ tags like description and keywords will make much difference to a search engine. In 1997 that may have been enough, but search algorithms are much more advanced today. In fact, many of my colleagues believe that including the keywords tag on a page can actually harm a page’s rankings (more on this in other posts).

So, what you should remember about indexing is that it’s how a search engine collects and stores information about web sites. Also remember that influencing a search engine by changing on-page factors like META tags, keyword density, or any other easy to manipulate metric is much more difficult than it was in the past and certainly not a strategy to base your search engine marketing upon…

Search Engines 101 - Indexing (Part 1)

When you go to Yahoo! or Google and do a web search for something, you’re not actually searching the internet. What you’re doing is searching a massive database that each has created that’s filled with information about the billions of pages on the internet.

The terms you search for are compared with the information in this database and a list of web pages that Yahoo! or Google think most closely match what you’re searching for are listed as results.

In addition to the actual data in their databases, what makes the Yahoo!, Google, MSN/Live, Ask.com, etc. search engines different is the way each one of them ‘index’ web pages.

Indexing refers to the proprietary methods a search engine uses to find web pages on the internet and how they add information about web pages to their database when they visit a web page. This database of information about web pages is called a search engine’s ‘index’.

Search engines fill their databases with information by sequentially visiting web pages, collecting information about the page’s content, and following links out of the site to discover new web pages. This procedure is called ’spidering’ and will be covered in another post.

Spidering takes resources - both in the bandwidth necessary to traverse the internet and return data about web pages and in the storage capacity necessary to store whatever information is collected. Because a spider visit is just like any other visit to a web page, bandwidth fees are incurred by the page owner as well.

In the early days of the internet when connections were slow, bandwidth expensive, and data storage at a premium, search engines typically only saved the title and location (url) of a web page and a list of keywords that described the content of the page. It was simply too expensive for a search engine to pay for the bandwidth and storage necessary to save more information about a web page, not to mention the additional time indexing more content on a page would add to the already lengthy spidering process. These expense issues were also important to site owners as more intensive or more frequent spider visits meant increased bandwidth costs for them as well.

So, a compromise was reached. Web designers would supply a list of keywords relevant to the content of the site (in the ‘keywords’ META tag hidden in the code of the page) on the main page of the site and search engines wouldn’t need to visit every web page on the site trying to find out what each was about.

This gentleman’s agreement worked well until the web started to commercialize and web designers realized they could put anything they wanted in the ‘keywords’ tag, regardless of what their web site was actually about. This resulted in sites being found in search results for keywords that had little to nothing to do with their actual content and also reduced the efficacy of the search engine as a useful way to find relevant content online.

In Part 2, how search engines responded.

Authority Content is King

You’ve likely heard the addage “Content is King” - meaning the best way to get traffic to your website is to fill it with quality content. Before optimizing your page structure for search engines, before trying to build links, and before pay-per-click ad campaigns - if you don’t have quality content on your site that people are willing to read, bookmark, talk about, and link to, those other forms of marketing won’t be nearly as effective.

I’ll leave discussions about the what, when, why, and how of quality content for other blog posts, I just wanted to share a post I read recently in a forum that nicely encapsulates the end goal of providing quality content.

I’m a professional writer and run an informational website so when i started up 2 years ago i assumed i must be onto a goldmine as everyone kept talking about how content is king.

So there i was, waiting for all those natural, organic lnks to come rolling in. I was waiting for a long time.

Recently I had an epiphany on the matter. It’s not good content alone that draws links, it’a authority content. If you have one really funny article about buying widgets in Peru then yes, it may be a good read but who’s going to link to it?

If on the other hand you get a criticall mass of content together about widgets in peru, their history, their production, the culture of the users, the types of widget available - in short, more information about widgets in peru than a reader could digest in one visit - then people will consider your pages to be a resource and will bookmark and link to you.

Content needs to be well written but that isn’t enough. It needs to belong to a larget section of content that has authority status in the eyes of the user. Take every angle of your subect and write separate articles and guides about it. You might think you’re repeating yourself at times but to someone researchign the topic, all good info available is manna.