Today’s post is an answer to a question I took a few weeks ago:
How to organize archives/categories on WordPress for news site/blog that publishes a lot of articles, around 30-50 daily.
your site’s link equity will determine your crawling depth and crawling frequency
This is what people mean when they mention a flat site architecture or crawling path: you aren’t more than 4 clicks from any other post. You also want to make sure you have your robots.txt and robots meta tags configured properly to allow the spiders to crawl that path. If you are publishing a very high volume of posts/pages, you’re going to want to get as many links on the archive page as possible, without becoming excessive. Google recommends no more than 100 links per page. In reality that number is really affected by the trust and authority of your inbound links or your link equity.
If you aren’t familiar with the term link equity you should read this article by Eric Ward on link equity. Basically, your site’s link equity will determine your crawling depth and crawling frequency. The more links you have and the stronger those links are, the deeper the search engines will crawl and the more frequently they will re-crawl it. This is a difficult problem for new websites: they need to add content, but if they add too much too soon, it won’t get crawled. So new sites need to balance content creation with link building.
Some other tools you can use to help flatten out a website and increase crawling depth are breadcrumbs. Joost De Valk makes an excellent breadcrumb plugin for WordPress websites. A related posts plugin that changes the related posts at the end of each post will also help. I like yet another related posts plugin. Also make sure you are generating an HTML sitemap I like the dagon sitemap generator.
So, to wrap things up, here’s what you want to do:
- Provide one or more short crawling paths for spiders. Try to keep the path as short as possible because anything beyond 5 levels is a problem.
- Make sure your robots.txt and robots meta tags don’t accidentally block the crawling path.
- Provide alternate crawling paths or access to the website content with breadcrumbs, related posts, and sitemaps.
- Try to minimize maintenance and make sure all or as many of these solutions as possible update automatically.