Enterprise sites: 7 quick tips to optimise your crawl budget

When working with an enterprise-sized website, managing your crawl budget can be the difference between new and existing content being propelled up the rankings, and huge sections of your site dropping like a lead balloon.

There are many ways to improve the utilisation of your crawl budget.

Here, we’ve assembled 7 quick tips that you can apply today. These are:

Define robots.txt rules
Break redirect chains
Fix broken links
Manage dynamic content
Set sitemap priorities
Improve caching
Review thin content

1. Define crawl paths with robots.txt rules

Use your robots.txt file to block URL paths that Google is following when it doesn’t need to.

For instance, you might have seen in your log file analysis that Google is hitting hundreds of internal site search pages. These don’t need to be crawled because all your content is discoverable elsewhere. Therefore, blocking all URLs with the search parameter makes sense.

2. Identify & break redirect chains

Find multiple redirect status codes (3XX) that chain together before finally resolving. This is where page A redirects to page B, which redirects to page C and so on. Every redirect is counting against your crawl budget, so bypass all the redirects between the start URL and the final requested URL to make a saving and allow Google to use that crawl budget elsewhere. Then update all internal links to avoid triggering those redirects in the first place.

3. Fix broken internal links

Pick out those client errors (4XX) in the same way as redirects. Here, we want to put redirects in place to capture any benefits of inbound links pointing to those URLs, as well as repair the internal links that trigger them.

4. Manage your dynamic URLs

If you’re noticing lots of dynamic URLs (those using parameters) in your log files, then you need to decide if these URLs are necessary or not. This is most often caused by internal search pages (as covered above) or through filtering systems that narrow or sort page content. Whilst you can handle any possible duplicate content issues for Google using the canonical tag or ensure they don’t reach the index using the meta robots tag, it’s not always the best trade off if your crawl budgets are still compromised as a result.

5. Set your sitemap priorities

XML sitemaps allow you to set different crawl priorities and update frequencies on a URL-by-URL basis. Use this to send a strong signal to Google as to what you need crawled most often. Normally, this is homepage and top-level product / news categories. De-prioritise pages within sections that don’t update often.

6. Improve your caching

As we’ve said, every request Google makes to your server eats up crawl budget. This means that pages that rely on lots of additional resources to load can take up more of your budget than is necessary. Identify which resources are shared across multiple URLs, then make sure that Google can cache these and only request them once.

7. Review thin content

Sometimes the number of pages we publish gets out of hand, especially with large enterprise sites. Going through your content pages and identifying those where content might be combined into a single page helps reduce the number of crawl requests required for search engines to get to that content. A bonus in doing so: you’re also reducing the amount of competition in the index for the terms those pages target.

Need some guidance?

If you’re not sure where to start in optimising crawl budget, get in touch with us today. Skittle Digital offers a limited number of free Free Acquisitions Workshops, which will uncover the strengths, weaknesses and opportunities for your website – including recommendations for crawl budget handling and actional insights.

AUTHOR

James Newhouse

Agency Lead

James has worked in digital marketing since 2009. He has led successful technical, link building, digital PR and content teams, and shared SEO advice in national outlets like The Telegraph. There’s not a lot he doesn’t know about SEO strategy, having worked across most enterprise verticals including household name ecommerce giants and international law firms. In his spare time, you’ll find him fixing rusty old Land Rovers or playing tabletop board games with friends.

linkedin youtube

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
AWSELB	session	This cookie is associated with Amazon Web Services and is used for managing sticky sessions across production servers.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONID	session	Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
ppwp_wp_session	30 minutes	No description
time_zone	session	No description available.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
webinargeek	session	No description

Cookie	Duration	Description
kameleoonVisitorCode	1 year 14 days	This cookie is set by the provider Kameleoon. This cookie is used for storing a visitor code which helps in full stack experiment.
optimizelyDomainTestCookie	5 months 27 days	No description
optimizelyEndUserId	5 months 27 days	set by the Optimizely website optimization platform. This cookie is used to store a unique identifier which is a combination of an identifier and a random number. The purpose of the cookie is to track information on a per user basis. This is to allow the user to be properly identified and prevent duplicated data.
optimizelyRumLB	session	No description available.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.
_gat_UA-173349264-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.

Cookie	Duration	Description
_hjAbsoluteSessionInProgress	30 minutes	No description available.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	2 minutes	No description available.
_hjTLDTest	session	No description available.
mautic_device_id	1 year	This cookie is set by the provider Mautic.This cookie is used for identifying visitor across visits and devices. Mautic cookies are used for supporting marketing activities.
mautic_referer_id	30 minutes	This cookie is set by the provider Mautic. This cookie is used for marketing purposes. It heps in tracking people submitting forms.
mtc_id	session	This cookie is set by the provider Mautic.This cookie is used for setting unique ID for visitor, to track visitor across multiple websites inorder to serve them with relevant advertisements. Mautic cookies are used for supporting marketing activities.
uid	1 year	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.

Services

Sectors

Knowledge  Hub

Enterprise sites: 7 quick tips to optimise your crawl budget

1. Define crawl paths with robots.txt rules

2. Identify & break redirect chains

3. Fix broken internal links

4. Manage your dynamic URLs

5. Set your sitemap priorities

6. Improve your caching

7. Review thin content

Need some guidance?

James Newhouse

Agency Lead

Similar posts.

Google March 2024 Core Update – Webinar Q&A Round Up

Google’s March 2024 core update & new spam policies

New for Google Ads: Automatic Performance Reports

Content marketing campaign failures: How to recover from them

Identifying the right KPIs for successful content marketing campaigns

New rich results in Google SERPs for retailers

Subscribe for exclusive industry insights.

Sign up

Get in touch

Services

Sectors

Connect

Legal

Awards

Certifications

Accreditations

Services

Sectors

Knowledge Hub

Enterprise sites: 7 quick tips to optimise your crawl budget

1. Define crawl paths with robots.txt rules

2. Identify & break redirect chains

3. Fix broken internal links

4. Manage your dynamic URLs

5. Set your sitemap priorities

6. Improve your caching

7. Review thin content

Need some guidance?

James Newhouse

Agency Lead

Similar posts.

Google March 2024 Core Update – Webinar Q&A Round Up

Google’s March 2024 core update & new spam policies

New for Google Ads: Automatic Performance Reports

Content marketing campaign failures: How to recover from them

Identifying the right KPIs for successful content marketing campaigns

New rich results in Google SERPs for retailers

Subscribe for exclusive industry insights.

Sign up

Get in touch

Services

Sectors

Connect

Legal

Awards

Certifications

Accreditations

Knowledge  Hub