Google Patent Analysis

I’ve spent a good deal of time of the past two days analyzing the Google patent. Now part of me says you spent the time doing the research you should keep it to yourself and not share. The other part of me says two heads are better than one, and someone could point out a flaw or oversight in your research, helping you reach a new conclusion. That part won the argument, so I’ll share my thinking.

Many people who have seen the document think it’s nothing more than red herring to throw people off, others disagree. I believe the patent expresses three things:

  1. Factors Google thinks are important and may be in the current algorithm
  2. Factors Google thinks are important and want to incorporate into the algorithm in the next 3-5 years
  3. Factors Google would like to stake an early claim to, so competitors don’t use them.

If you read throughout the document you’ll see many very broad, wide sweeping and often contradictory statements, causing people to dismiss the document as rubbish. However I think they are missing the point. What Google is saying is the actions and behaviors of Search Engine Optimizers mimic those of real websites, however they differ in scale, intent, and relationship to other factors. For example if a website suddenly gains 500 new links in a week is that good or bad? The answer is it depends. If the links were for a breaking, hot or trendy search term probably not, otherwise it probably yes. So if a website had a higher than average growth of inbound links for a particular term, yet there was no corresponding spike in search volume, then it would be reasonable to assume that the growth in links is spam. From the algo’s point of view the relevant anchor text would be given a high score, but also have a strong indication of being search engine spam. When you look at the website or document as whole, if it has lot of factors that have a strong indication of being spam, the likely hood of it not being a “natural” occurrence increases and you’re website will be filtered out (aka sandboxed).

Think of it this way let’s say you’re driving a red corvette down the street. You won’t attract to much attention. Now add in that you’re driving 10 miles per hour over the speed limit, still not that big a deal. Now add in a broken tail light, starts to look more suspicious. Next you’ve got the convertible top down, and the music is blasting. Finally the person in your passenger seat is hanging out of the car flailing their arms screaming. You will get pulled over. Other than your passenger hanging out of the car, none of the offense would get you pulled over by themselves, but the more of them you combine the more likely you are a troublemaker.

Here’s a list of some of the factors mentioned in the paper, again there are perfectly normal reasons for any of these monitored factors to change, the point I make is more warning flags that you set off at once the more likely you are to be spamming the search engines. I’ve included the sections where I drew my conclusions from for reference.

Domain Factors

  • Length of domain registration (section 0099)
  • Domains are monitored for changes in expiration (section 38,39)
  • Nameserver, and Whois data is monitored for changes and valid physical addresses (same technology used in google maps)
  • Name servers and possibly class C networks should have a mix of whois data, registrars, and keyword and non-keyword domains (section 0101)
  • Documents/websites are given a discovery date when they are discovered through any of the following means
  • external link
  • user gathered data
    (sections 1,2,3,4, 38)
  • Websites must have more than one document (section 5)
  • Change in the weighting of key terms for a domain are monitored for changes (section 50)
  • Changes in a domain to topics that don’t match prior content are an indicator of change of focus, existing prior links will be discounted (section 0084)

Documents and Pages

  • Documents are compared for changes in the following
  • frequency (time frame)
  • amount of change
  • (section 6,7,8, 9, 11, 12)
  • Number of new documents (internal ?) linked to document is recorded (sections 9,13)
  • Change in the weighting of key terms for the document is recorded (section 10, 14)
  • Documents are given a staleness (lack of change?) rating (section 19)
  • The rate at which content of a document changes and it’s anchor text changes are recorded (section 31, 33)
  • Outbound links to low trust or affiliate websites may be an indicator of low quality (section 0089)
  • Don’t change the focus of many documents at once ( section 0128)

Links

  • A links anchor text and discovery date are recorded (sections 54, 55, 56, 57, 58)
  • Links are given a discovery date and monitored for appearance and disappearance over time(section 22,26, 58)
  • Links and anchor text are monitored for growth rates (section 48)
  • Links are monitored for changes in anchor text over a given period of time (sections 27, 30, 54, 55, 56, 57, 58)
  • Links are weighted on trust or authoritativeness of the linking document, as is the newness or longevity of the link (section 28, 58, 0074)
  • Link growth of independent peer documents (different class C networks?) are monitored.
  • The rate at which new links to a document appear or disappear is monitored (sections 23, 24)
  • A freshness rating of new links is recorded (section 32)
  • It is determined whether a document has trend of appearing or disappearing links (section 25)
  • A distribution rating for the age of all links is recorded (section 29)
  • Links that have a long lifespan are more valuable over links that have a shorter lifespan (section 59)
  • Links from stale pages are devalued where links from fresh pages are given a boost (section 60)
  • Link churn is monitored and recorded (section 61, 62)
  • New websites are not expected to have a large number of links (section 0038)
  • Link growth should remain constant and slow (section 0069, 0077)
  • Burst link growth may be a strong indicator of search engine spam ( section 0077)
  • If a document is stale (not changed) but is still acquiring new links it will be considered fresh ( section 0075)
  • If a document is stale and has no link growth or has a decrease of inbound links it’s outbound links will be discounted (section 0080)
  • A spike in links would be acceptable if document has one or more links from authority documents (section 0110)
  • Anchor text should be varied as much as possible (sections 0120, 121)
  • The growth of variation in anchor text should remain consistent (section 0120, 0121)

Search Results

  • Volume of searches over time are recorded and monitored for increases (sections 17, 18)
  • Information regarding a documents rankings are recorded and monitored for changes (sections 41, 42, 43)
  • Click through rates are monitored for changes in seasonality, or burst increases, or other spike traffic (section 43, 44)
  • Click through rates are monitored for increase or decrease trends (section 51, 52, 53)
  • Click through rates are monitored to see if stale or fresh documents are preferred for a search query (sections 20, 21)
  • Click through rates for documents for a search term is recorded (sections 15, 16, 37, 43)

User Data

  • traffic to a document is recorded and monitored for changes (possibly through toolbar, or desktop searches of cache and history files) (section 34, 35)
  • User behavior is websites are monitored and recorded for changes (click through back button etc)(section 36, 37)
  • User behavior is monitored through bookmarks, cache, favorites, and temp files (possibly through google toolbar or desktop search) (section 46)
  • Bookmarks and favorites are monitored for both additions and deletions (section 0114, 0115)
  • User behavior for documents are monitored for trends changes (section 47)
  • The time a user spends on website may be used to indicate a documents quality of freshness (section 0094)

miscellaneous

  • Documents that change frequently in ranking may be be considered untrustworthiness (0104)
  • Keywords with little or no change in results should match domains with stable rankings (section 0105, 106, 107)
  • Keywords with high volatility of change should have domains with more volatility (section 0105, 106, 107)

Again what and how much of this is actually in place is open for debate. If you think I interpreted something incorrectly please let me know. If you have other ideas let me know I’d be glad to add them here.

Technorati tag: , , search engine optimization, internet marketing, google patent

GraywolfSEO.com runs on the Genesis Framework

Genesis Framework

Genesis lets you to quickly and easily build amazing websites with WordPress. Whether you're a novice or advanced developer, Genesis provides the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.
It's that simple - start using Genesis now!


Take advantage of the 6 default layout options, comprehensive SEO settings, rock-solid security, flexible theme options, cool custom widgets, custom design hooks, and a huge selection of child themes ("skins") that make your site look the way you want it to. With automatic theme updates and world-class support included, Genesis is the smart choice for your WordPress website or blog.


tla starter kit

Advertisers:

  1. Text Link Ads - New customers can get $100 in free text links.
  2. BOTW.org - Get a premier listing in the internet's oldest directory.
  3. Need an SEO Audit for your website, look at my SEO Consulting Services
  4. TigerTech - Great Web Hosting service at a great price.