One of the more difficult problems to diagnose is has your website been banned or is there something else wrong? I have the unfortunate task of trying to diagnose the problem for one of my sites and I thought it might be educational to let you in on the process.
First indicator a radical drop in traffic, especially from Google
off to Google to check some rankings nothing’s listing. Next we do some standard checks:
site:example.com – sites mentioning the site but nothing from the domain in question
site:www.example.com – sites mentioning the site but nothing from the domain in question
inurl:example.com – nothing
inurl:www.example.com – nothing
link:example.com – returns results ***
link:www.example.com – returns results ***
Now the fact that the link command has results gives me some indication it’s not banned. This site was not part of Google’s sitemaps program so I submitted it, wondering if I would actually get some indication, the screen below is what I saw:
Not a lot of help there. It would really be helpful if there was some little message that said something like “this site has been removed from the index for violating google quality guidelines” or at least if you knew with 100% certainty that it would be there if you were in fact banned. I get why you can’t tell me why I was banned, but at least tell me that I was banned.
Looking at the new crawl rate graph here’s what we get
So it looks like Google can get in and is crawling. However looking at my site reports the only two pages crawled after October 6th are the robots.txt file which gets checked every few days, and the main page for the blog which gets checked every day or two, both were last checked on the 15th. Now if it was just the robots.txt file again I’d lean towards a ban, but the blog page is giving some conflicting data. I open up the robots.txt to check that it wasn’t hacked and changed and that looks ok and the htaccess file looks good too. I blow the dust off internet explorer with the google toolbar and I see a page rank three. Usually when you’re banned you’ll see 0 or graybar so more conflicting data. Lastly I don’t see any of the chilling effects “some sites have been ommitted” links at the bottom of the SERP’s.
I haven’t told you much about the site up until now, and that’s actually on purpose. The site’s been running since 2005, there’s maybe a little bit of dupe content and some slightly aggressive linking, but nothing too heinous. The site does run adsense and was actually a site selected to be part of the Google Adsense Video Beta program, and got me a cool lava lamp. So that does mean some people looked at the site but I don’t think they would have approved it for the program if it was spam central, and I don’t think there’s a connection between that and the site dropping (must resist tin foil hat).
So I don’t think the site was banned but it’s hard to say for certain, but the lack of anything under the URL and complete loss of rankings is odd. I’m going to make a sitemap and submit it and see what happens. If I don’t see things start to return to normal next week I’ll probably move hosts and see if that’s an issue.
So I submitted the sitemap last night before going to bed and here the new sitemaps diagnostic page
and Here’s the summary page
Running my stats program I can see they visited the homepage and the main blog page, again not the behavior I would expect to see on a banned site
So I’ve gotten some data out of Google Sitemaps, which is very interesting …
They tell me 33 URL’s are blocked in the robots.txt file. As I mentioned above I checked the robots.txt file and htaccess file both of which looked ok and had the original date and timestamps on them. I grabbed a handful of URL’s and submitted them through the robots analysis tool and the URL’s that were blocked are now OK. At this stage my two best working theories are as follows:
a) there was some technical issue at the host and the wrong files somehow got served creating an “issue” with the robots.txt file
b) Someone hacked into my account uploaded a bogus robots file and submitted the site to the URL removal tool. Now if they were a clever chap they would have saved the original robots file and put it back once the damage was done. Unfortunately this host doesn’t give me access to the FTP logs so I can’t verify that action. I have however changed the login as a precaution.
If that was the case it’s probably going to take quite a while to get all 400 pages back in the index, which is kind of a bummer as we are coming into to “prime season” for this website.
The adsense bot has been all over the popular pages all day. I have seen the standard Googlebot on site three times, twice to the home page and once to a deep page, within the past 12 hours. It would be nice if the media bot fed the pages into the index, I don’t expect to see anything for 4-5 days will keep this thread updated.
Well I’ve made some progress Goolge only thinks 16 pages are blocked via robots.txt down from 33. Sadly no pages are in the index yet
FINAL UPDATE see here for details of how I got back in the index