Google Sitemaps and Site Migration

Recently I’ve been laying the groundwork for some future SEO projects doing some site migrations onto content management systems, showing search spiders friendlier content, and cleaning up some URL structures. I’ve been using Google Sitemaps to help keep track of how the projects are progressing.

Some of the problems I’ve been working to overcome are uncrawlable product/shopping cart URL’s like this

example.com/gifts/PSProductResults.aspx?SCAT_Id=3272&DPSV_Id=59659&CATY_Id=570&ASI=&WS=&LN=&pF=&pT=

another is a hodgepodge of architecture techniques like

example.com/foo/index.php
example.com/foo/index.html
example.com/foo.html

While it is completely possible to rank with a mix like that is does make it really hard to maintain and keep track of without loosing stuff on the way.

sitemapsloginFirst a bit of a warning, IF YOUR SITE IS RANKING AND YOU ARE GETTING ORGANIC LISTINGS AND TRAFFIC DON’T MUCK WITH IT! If you do decide to muck with it do it very, very, slowly. I’d recommend logging in and creating a sitemaps account first before you actually make any changes. It gives you an idea of what’s going on and any problems that exist right off the bat. Note to the sitemaps team do you have to show the ‘submit a reinclusion request’ on the login screen if someone isn’t banned, it scares non-seo types who think it’s there for a reason.

Another thing I learned is many site owners really have no idea what’s going on on their sites. They know about a handful of pages, but by and large they really don’t have a full understanding of what pages are on their sites. I didn’t find any hidden spam but on the 6 sites I worked on all of them had old pages that were linked to from somewhere and had either duplicate, outdated or missing information. For example if the information was originally on example.com/page-1.htm and a significant update was made the information it was then put on example.com/page-2.htm . However what invariably happened was a few dangling links existed to page-1.htm and it stayed in the index. If it’s an update keep it on the same page if at all possible. If not delete the old page (keep a backup offline) and 301 the old page to the new page. If it’s for an annual event build a date into the URL structure like:

example.com/event/2005/
example.com/event/2006/

For the shopping cart site I had to roll up my sleeves and get into some programming. First I blocked the robots from the entire dynamic shopping cart structure. I then wrote some programming to generate a flat friendly URL structure for each of the departments and products. In addition I wrote a script to automatically generate a new XML sitemap with links to each of the departments and products, and send a ping to Google. I wouldn’t consider these doorway pages per se as they are perfectly readable for human visitors. I’d consider them teaser or jump pages leading directly to the products. (note: Some of you might ask why not use javascript to bring them directly to the product page instead of adding a page/click. While that would work it’s a little too iffy on the ‘sneaky redirects’ and ‘cloaking’ for an otherwise clean client website.)

For heavily/frequently crawled sites you may get feedback sooner, but I’ve found it takes 2-4 weeks from when you start moving pages around for the new sitemaps data to start to show up. So go out and do of all of the changes at once, ping sitemaps, wait and then check back to see what’s changed, and make adjustments.

Now above I recommended not changing things dramatically if you are ranking and getting traffic. I went completely against that advice here on this blog recently. I did this for two reasons, first I want to see what can really go wrong when you do a large scale site migration (1000+ pages). Second I think it’s important to experiment and try new stuff from time to time, you can really learn a lot from the mistakes you make. Here’s a screen shot of my missing pages you can see I’ve got over 180 pages that are missing that I need to 301 to the new locations.

Sitemaps error pages

While the experiment is clearly still in progress I haven’t seen a dramatic drop

Related Information

  • Google Sitemaps – Have a conversation with Google about your site
  • GOODROI Internet Marketing » Those Google Engineer’s are getting better looking
  • Vanessa Fox of Google Sitemaps Interview (mp3 format)
  • GOODROI Internet Marketing » Feb. 23 – Google Sitemaps with Vanessa Fox, Google Engineer
  • GOODROI Internet Marketing » Follow-up on Google Sitemaps with Vanessa Fox
  • Interview with Vanessa Fox of Google Sitemaps – Sugarrae

GraywolfSEO.com runs on the Genesis Framework

Genesis Framework

Genesis lets you to quickly and easily build amazing websites with WordPress. Whether you're a novice or advanced developer, Genesis provides the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.
It's that simple - start using Genesis now!


Take advantage of the 6 default layout options, comprehensive SEO settings, rock-solid security, flexible theme options, cool custom widgets, custom design hooks, and a huge selection of child themes ("skins") that make your site look the way you want it to. With automatic theme updates and world-class support included, Genesis is the smart choice for your WordPress website or blog.


tla starter kit

Advertisers:

  1. Text Link Ads - New customers can get $100 in free text links.
  2. BOTW.org - Get a premier listing in the internet's oldest directory.
  3. Need an SEO Audit for your website, look at my SEO Consulting Services
  4. TigerTech - Great Web Hosting service at a great price.