HeroGraphic

9 essential XML sitemap optimisations for better SEO

sitemap seo

Beyond getting your site structure right, how you present your sitemap to search engines is an underappreciated dimension of SEO.

Search engines, like Google, find it much easier to discover and prioritise pages for crawling, when they’re presented in a structured way. This is why the most established search engines allow for the processing or submission of a sitemap in XML format.

Now XML sitemap optimisation has become a staple of modern SEO, especially for larger, enterprise sized websites, where URL discovery via internal links and navigation alone is sometimes not enough to get all your key content considered, indexed and ranking quickly enough.

Whilst a clean site structure is still paramount, optimising and submitting an XML sitemap can speed things up considerably.

What is an XML file?

An XML, or eXtensible Markup Language file, organises data within a hierarchy of elements. Ultimately, when this format is used to create a sitemap, your file can contain more than just a list of pages, helping search engines to understand your ranking strategy more easily. 

The file type, which ends with the extension .XML, was developed and championed by the W3C back in 1998. It is designed to be both machine (search engine) and human readable. 

This file organises data by marking each element with a tag labelling it as a specific data type, and individual elements can contain further child elements. The plain text within each element is the data itself. 

In this blog post, we will present 9 essential ways in which to optimise your XML sitemap for faster, more efficient URL discovery, indexation and ranking:  

  1. Use multiple sitemaps 
  2. Use tools to generate your sitemaps 
  3. Use the priority tag 
  4. Use the changefreq tag 
  5. Use the lastmod tag 
  6. Include only canonical pages 
  7. Use international hreflang tags 
  8. Signpost your sitemap in robots.txt 
  9. Submit to Google Search Console

1. Use multiple sitemaps

Breaking your URLs out into multiple sitemaps, grouped by logical content types, is one advantage of using XML sitemaps. 

As an example, you might have separate XML sitemap files for: 

  • Static pages 
  • News posts 
  • Each blog category
  • Products
  • Images 

Each of these can then be submitted to search engines and monitored separately, allowing you to quickly identify indexation and ranking issues within specific sections of your site. 

It is also worth considering that each XML sitemap file can only process up to 50,000 URLs, or 50MB of uncompressed information. This makes breaking your sitemaps up into smaller files essential if you have a large site.  

All sitemaps can then be tied together in one Sitemap Index which signposts all your smaller sitemaps to search engines. A Sitemap Index is another XML file which lists your other XML sitemaps and their attributes.

2. Use tools to generate your sitemaps 

Many Content Management Systems (CMS) either automatically generate a sitemap or offer compatibility with third party plugins which can do this for you.  

However, one major disadvantage is that often this type of automatically generated sitemap will include ALL URLs that your CMS has generated, potentially including pages which just don’t need to be crawled. In WordPress for instance, these are typically author, date and tag archives. If there are URLs that you don’t need crawled, putting them into a sitemap is only going to encourage crawling more. 

Screaming Frog and other crawlers offer the option of generating configurable sitemap files following a successful crawl. 

More than just a time saving exercise, using a tool to automate sitemap production can help automatically populate some of the essential element attributes – tags – across vast numbers of pages, which would otherwise take far too much time to extract, and code manually.

3. Use the priority tag 

The priority tag is there to weight the priority of your pages for crawling. 1 is the highest priority, and 0 is the lowest priority. Decimals values between 0 and 1 scale the priority accordingly. As an example: 

  • 1.0 – homepage 
  • 0.9 – Top level blog category pages 
  • 0.5 – blog article
  • 0.1 – static pages, like privacy policies, T&Cs
  • 0.0 – outdated content that generates residual traffic 

Google indicates that they try not to use the priority tag in XML sitemaps to influence crawling. However, they haven’t explicitly said they ignore it completely. Therefore, there doesn’t appear to be any reason not to use them – especially as Google isn’t the only search engine or bot crawling your site.

4. Use the changefreq tag 

The changefreq (or change frequency) tag is an indication of how often content is likely to change on a given URL. 

This can be as follows, with examples in brackets: 

  • Always (updated in real-time or near real-time, for example live posts with a continuous stream of User Generated Content) 
  • Hourly (fast moving news hubs where stories are updated frequently as there are developments) 
  • Daily (main blog category pages where links to blogs are appearing)
  • Weekly (product categories pages where new products might be appearing) 
  • Monthly (terms & conditions, individual blog posts) 
  • Yearly (about us, other static pages) 
  • Never (old news pieces) 

Again, whilst Google have in recent years downplayed its importance, there is no reason not to use the tag and Google hasn’t outright stated its redundancy.

5. Use the lastmod tag 

Another tag to include in your XML sitemap is the lastmod (or last modified) tag. This tag indicates to search engines when the page was last updated. Search engines like Google use this tag to work out which pages are changing most often, and therefore should be crawled more frequently.

6. Include only canonical pages 

Your XML sitemap should only include canonical pages. By this, we mean pages that: 

  • Only return a 200 status code 
  • Do not include a canonical tag which references another page 
  • Do not redirect (3XX status code) 
  • Do not return a client error (4xx status code) 

To include anything else results in sending mixed signals to search engines, as well as wasting valuable crawl budgets. 

7. Use international hreflang tags 

Search engines can read your international SEO hreflang tags through your XML sitemap. It is often a much simpler, more flexible approach to implement your hreflang tags within your XML sitemap as opposed to including them “on-page” – as this method requires no real development resource or coding knowledge.  

It is also much easier to keep your hreflang tags up to date using this method, because you can replace your XML sitemap file easily via File Transfer Protocol (FTP) as often as you need without troubling your development team. 

8. Signpost your sitemap in robots.txt 

Google needs to know where your sitemaps are and rely on you signposting them in your robots.txt file.  

If you have multiple sitemaps referenced in a single sitemap index, you only need to include the signpost to your Sitemap Index, saving space in your robots.txt file and time for the search engines.  

To add your sitemap to your robots.txt, simply type “Sitemap: ” (note the space after the colon) followed by the complete URL of your XML sitemap file. 

9. Submit to Google Search Console 

Once you’ve assembled your XML sitemap, hosted it at the root of your site, and signposted it in robots.txt, you should submit it within Google Search Console for processing. Submit each sitemap individually to allow you to monitor indexation of each sitemap’s URLs separately – because you’ve divided your URLs into different categories, you can now quicky determine which parts of your site need additional attention.

Need help with sitemap optimisation? 

If you need help organising your sitemaps, or don’t know where to start in creating an XML file, then Skittle Digital can help.  

Book a Free Acquisitions Workshop to receive invaluable advice and insights from our marketing experts, as well as a tailored action plan to get your sitemap into perfect shape. 

AUTHOR

James Newhouse

Agency Lead

James has worked in digital marketing since 2009. He has led successful technical, link building, digital PR and content teams, and shared SEO advice in national outlets like The Telegraph. There’s not a lot he doesn’t know about SEO strategy, having worked across most enterprise verticals including household name ecommerce giants and international law firms. In his spare time, you’ll find him fixing rusty old Land Rovers or playing tabletop board games with friends.

Similar posts.