In the past few months all of the search engines, but especially Google have released or started supporting new ways of fixing site architecture issues. However IMHO these are band-aid solutions for bad site architecture, and not something you should rely on … at all.
In 2009 some of the big advances we’ve had the major search engines announce support for are things like the “rel=canonical” tag which allows you to tell a search engine this content at this URL may exist in more than one place, but you should credit it to one URL. Another new item is the ability to tell the search engines to ignore certain URL parameters. So if you’re using tracking parameters, session ID’s or other items you can now tell the search engines to ignore them entirely. There are many people who are happy to have these new tools, as it allows them to fix “issues” they have had for years. However I think these tools are crutches for lazy and incompetent programmers and developers, and should be avoided like the plague … and I’ll tell you why …
In the dawn of the public internet there where dozens of search engines that webmasters and publishers had to deal with, combine this with a lack of standards and the online publishing community had a lot of growing pains. When the dot-com bubble burst and the market consolidated we where really left with four big search engines. As Google pursued it’s relentless market domination, under the guise of a garage start-up bathed in the light of primary colored lava lamps, they stole the thunder of everyone else (coincidentally of course) and, established themselves as the ruling organization and defacto standards setting body, the rest of the world be dammed.
This is bad, as sloppy publishers and slip shod developers, now use google’s band-aid solutions, instead of developing websites and applications that don’t introduce problems that don’t need to exist in the first place. Case and point look at this URL from Forbes Magazine:
Notice the [feed=rss_news] part at the end, that enables Forbes to track where visitors came from, in this case RSS, most likely a feed reader. But now there’s the problem of that same content existing at two URL’s, remove the feed parameter and it still works:
Not to worry we can use the “rel=canonical” to point to the URL without the parameter, we can also use webmaster tools to tell it to ignore the “feed” parameter and we’re good to go right … wrong junior that’s two band-aids you needed to use instead of solving the problem properly. What you should have done is issue a 301 redirect at the server level to the correct URL, and not rely on the client or bot to figure things out. Need that parameter for tracking drop it in a cookie.
Why does this matter … as publishers we want to foster and build an environment that’s friendly for more than one search engine. Here’s an experiment, go try and append a meaningless imaginary parameter or two to a URL and submit it to a social site like Digg or Stumbleupon and see what happens. The simple fact is they aren’t sophisticated enough to parse it out as tracking parameter. Lots of other social sites are trying to gain access to your content as well, and by using substandard architecture you aren’t helping yourself. Your story will get votes in two places, decreasing its ability to go “go hot” or “popular” as its votes are spread over two URL’s.
While social search may be in it’s infancy, and may never overtake traditional search, the easier you make your content to crawl and understand for everyone and not just google, the better off you’ll be in the long run. Get out of the habit of relying on the crawling and indexing band-aids of search engines for your survival, learn to write clean code that makes you self reliant for your long term lively hood and success.