OK, I’m going to cover some basics here but time and time again I get asked about monitoring indexation levels and what factors affect them.
There are quite a few misconceptions in this area and a lot of people rely on the wrong set of data, they also rely on old techniques to improve this, so let’s have a go at discussing some of the main areas.
HTML & XML Sitemaps
What does a sitemap really do? My opinion? Absolutely nothing in terms of getting a site indexed. It’s an auditing tool to monitor and test the architecture of your website, it doesn’t matter how many sitemaps you have, if the structure of your site is poor you’re going to have low indexation levels.
This is especially true of XML sitemaps; however what about HTML site maps?
Again in my opinion HTML site maps are over rated and used to cover up poor navigation, or used in the hope that they will create a magical ‘thumbs up’ signal to Google. The truth is most HTML site maps are not user friendly and consist of a pile of links, spread out over many pages, especially if you have a large site. With Panda hitting websites hard the last thing I would want on my site is a load of pages filled with html links tipping me over the low quality threshold.
The only time HTML site maps are effective in my opinion is if they are genuinely helping users navigate their way through a site and should include descriptive text as well as links to the various areas of the site, however if you own a website with 100,000 product pages the last thing you want to do is create a site linking to them all. The way your site is structured should help users and Google find all those products easily and effectively.
So to round up, yes use an xml sitemap to audit your site; don’t create a HTML site map unless it is user friendly and in place to help those users find their way to important parts of your site.
Try using multiple XML sitemaps to monitor the performance of different areas of your site.
Performing a Site Search
We’ve all done it and still probably do it, however this is a really inaccurate way of monitoring how many pages are actually indexed on your website. You can do one search and a search a minute later that fetches a different result.
However, if you know there are roughly 500,000 pages on your site, then doing a quick search can give you a very broad understanding of how well you are being indexed.
So go easy using the site: operator
This is the best way of understanding not only how well your website is indexed but also the quality of those pages, it’s been spoken of many times before but let’s go over it again.
Login to google analytics, go into traffic sources and select google / organic, then select a secondary dimension of ‘landing page’, this will show you how many pages Google sent traffic to over a certain period of time, monitoring this figure every month gives you a really good indication of the indexation levels of your website, I would use and monitor this figure over anything else.
So there we have a few ways of monitoring but how do we get more of our site indexed?
Sorry to go back to basics but want to cover it for a sense of completeness.
First thing, don’t get confused with getting your site crawled and getting it indexed; these are two completely separate things. Once Google finds out your site exists I have no doubt it will crawl your whole site at some point, however in my experience getting good amounts of your site indexed comes down to one factor and that is trust.
Having a flat architecture is all about creating the shortest route possible to the pages on your site, how many clicks are your major pages away from the top level?
(image from SEOmoz)
It’s simple really, the closer a page is to the top level the more trusted it is going to be and therefore has more chance of being indexed.
It all comes down to links
It’s easy to get a page ranked by building lots of anchor text rich links to a page, however if you have done this before you will also have realised it has very little impact on the overall trust of a website.
I speak with clients about this a lot, we point out an overall increase in traffic and indexation levels and the client will say ‘but you we’re trying to rank us for X not the other keywords that drove traffic, this was just natural growth that would have occurred anyway’.
Trusted, quality links will affect the overall organic traffic to a website and are absolutely essential for seeing a continual growth in organic traffic.
Yes by all means build optimised links to your site and go after rankings but you have to incorporate a strategy for getting links from the best sites in your industry, the most trusted sites, without them your website will never perform to its full potential.
The above are the main factors but site speed and quality of content since Panda are having more and more of an effect on the indexation of a website. Check the speed and make any necessary changes to rectify speed issues, also check for duplicate or pages with low quality content.
Remember Panda is a ‘threshold’ based algorithm, removing as many low quality pages from your site as possible will could be all it takes to sort out Panda issues.
As ever would love to hear your thoughts on this in the comments.