HeroGraphic

Enterprise sites: 7 quick tips to optimise your crawl budget

crawl budget

When working with an enterprise-sized website, managing your crawl budget can be the difference between new and existing content being propelled up the rankings, and huge sections of your site dropping like a lead balloon.  

There are many ways to improve the utilisation of your crawl budget. 

Here, we’ve assembled 7 quick tips that you can apply today. These are: 

  1. Define robots.txt rules 
  2. Break redirect chains 
  3. Fix broken links 
  4. Manage dynamic content 
  5. Set sitemap priorities 
  6. Improve caching 
  7. Review thin content

1. Define crawl paths with robots.txt rules 

Use your robots.txt file to block URL paths that Google is following when it doesn’t need to. 

For instance, you might have seen in your log file analysis that Google is hitting hundreds of internal site search pages. These don’t need to be crawled because all your content is discoverable elsewhere. Therefore, blocking all URLs with the search parameter makes sense. 

2. Identify & break redirect chains 

Find multiple redirect status codes (3XX) that chain together before finally resolving. This is where page A redirects to page B, which redirects to page C and so on. Every redirect is counting against your crawl budget, so bypass all the redirects between the start URL and the final requested URL to make a saving and allow Google to use that crawl budget elsewhere. Then update all internal links to avoid triggering those redirects in the first place.  

3. Fix broken internal links 

Pick out those client errors (4XX) in the same way as redirects. Here, we want to put redirects in place to capture any benefits of inbound links pointing to those URLs, as well as repair the internal links that trigger them. 

4. Manage your dynamic URLs

If you’re noticing lots of dynamic URLs (those using parameters) in your log files, then you need to decide if these URLs are necessary or not. This is most often caused by internal search pages (as covered above) or through filtering systems that narrow or sort page content. Whilst you can handle any possible duplicate content issues for Google using the canonical tag or ensure they don’t reach the index using the meta robots tag, it’s not always the best trade off if your crawl budgets are still compromised as a result. 

5. Set your sitemap priorities 

XML sitemaps allow you to set different crawl priorities and update frequencies on a URL-by-URL basis. Use this to send a strong signal to Google as to what you need crawled most often. Normally, this is homepage and top-level product / news categories. De-prioritise pages within sections that don’t update often. 

6. Improve your caching 

As we’ve said, every request Google makes to your server eats up crawl budget. This means that pages that rely on lots of additional resources to load can take up more of your budget than is necessary. Identify which resources are shared across multiple URLs, then make sure that Google can cache these and only request them once. 

7. Review thin content 

Sometimes the number of pages we publish gets out of hand, especially with large enterprise sites. Going through your content pages and identifying those where content might be combined into a single page helps reduce the number of crawl requests required for search engines to get to that content. A bonus in doing so: you’re also reducing the amount of competition in the index for the terms those pages target. 

Need some guidance? 

If you’re not sure where to start in optimising crawl budget, get in touch with us today. Skittle Digital offers a limited number of free Free Acquisitions Workshops, which will uncover the strengths, weaknesses and opportunities for your website – including recommendations for crawl budget handling and actional insights. 

AUTHOR

James Newhouse

Agency Lead

James has worked in digital marketing since 2009. He has led successful technical, link building, digital PR and content teams, and shared SEO advice in national outlets like The Telegraph. There’s not a lot he doesn’t know about SEO strategy, having worked across most enterprise verticals including household name ecommerce giants and international law firms. In his spare time, you’ll find him fixing rusty old Land Rovers or playing tabletop board games with friends.

Similar posts.