An XML (Extensible Markup Language) Sitemap is a text file used to detail all URLs on a website. It can include extra information (metadata) on each URL, with details of when they were last updated, how important they are and whether there are any other versions of the URL created in other languages. All of this is done to help the search engines crawl your website more efficiently, allowing any changes to be fed to them directly, including when a new page is added or an old one removed.
There is no guarantee that an XML Sitemap will get your pages crawled and indexed by search engines, but having one certainly increases your chances, particularly if your navigation or general internal linking strategy doesn’t link to all of your pages.
<urlset> - The Sitemap opens and closes with this tag. It is the current protocol standard.
<url> - This is the parent tag for each URL entry.
<loc> - This tag contains the absolute URL, or the locator of the page.
<lastmod> - This contains information about the file’s last modified date. It should be in YYYY-MM-DD format.
<changefreq> - This contains information about the frequency with which a file is changed.
<priority> - This indicates the file’s importance within the site. The value ranges from 0.0 to 1.0.
<xhtml:link> - In this case, this tag is used to provide details of alternate URLs offered in other languages.
The loc tag is compulsory, while the lastmod, changefreq and priority tags are optional.
Ideally, an XML Sitemap should be added to the root directory of the website. All URLs in the Sitemap must come from the same host.
Only the canonical version of all page URLs should be included, so pages should not redirect or return an error status.
The maximum length of the URLs is 2,048 characters.
While it may seem possible to manipulate search engines into thinking the content on your page is frequently updated by declaring the changefreq tag daily, it is not advisable to do so. If the frequency and priority tags do not reflect reality, chances are that search engine crawlers will ignore them.
All URLs in the Sitemap must come from the same host.
If you need help building your sitemap, there are several sitemap generator tools to help.
When you use multiple Sitemap files for one website, each file must be listed collectively in a separate file called the Sitemap index file.
You might need multiple XML Sitemaps if you have more than 50,000 URLs within a site or if one Sitemap exceeds 10Mb. If this is the case, then you will be required to create another XML Sitemap. You have the option to reduce your bandwidth requirement by compressing the Sitemap file using gzip but you need to make sure that after decompressing the file, the size still does not exceed 10Mb.
You can see an example of a Sitemap Index File to the left.
<sitemapindex> - The Sitemap index file opens and closes with this tag. It encloses all the XML Sitemaps in the file.
<sitemap> - This tag encloses information about individual Sitemaps.
NOTE: A Sitemap index file can link up to 50,000 XML Sitemap files.
Google can detect the various file types of Sitemaps. These are specific to the type of content on a site and help search engines identify the files much more easily.
List of Sitemap Filetypes:
XML schema (XSD) for Sitemaps 0.9 and supported Sitemap extensions give you the elements and attributes that need to be included in your XML Sitemaps. The schemas (depending on Sitemaps, Sitemap index files and different Sitemap supported file types) are as follows:
After creating your Sitemaps with all of the right elements and attributes in place, validate them using one of the following tools:
To test your Sitemap before you submit it in Google Search Console, click on the red Add/Test Sitemap button on the right, then enter the URL of the sitemap that you would like to test as shown in the screenshots below.
NOTE: This feature is also used to submit your Sitemap to Google. Another method you should use to tell search engines about your Sitemap is to add the following to your robots.txt file:
You can add this anywhere in the robots.txt file because the directive is independent of the user-agent line. You can also specify more than one Sitemap file per robots.txt file. Whenever you update your Sitemap, you can resubmit it to Google using the same Add/Test Sitemap option.
ADVANCED NOTE: You can also submit your Sitemap as an HTTP request. To do this you need to issue your request to the following URL:
Take a look at an example below:
URL encode the part after ping?sitemap=
Issue the HTTP request using wget, curl or any other method your web developer suggests.
If you have a site that uses a lot of images, it makes absolute sense to guide search engines to your image URLs by means of an image Sitemap.
Below is a sample of an image Sitemap.
<image:image> - This tag encloses each image URL. You can list up to 1000 such tags.
<image:loc> - This contains the URL of the image.
<image:caption> - This tag contains a caption for the image. It is optional.
<image:geo_location> - You can specify a geographic location of the image in this tag. It is optional.
<image:title> - This contains a title for the image. It is optional.
<image:license> - This tag contains the URL pointing to the license of the image. It is optional.
It may not be possible to get your entire site indexed, even with the perfect Sitemap. It is possible, however, to discover the indexation issues of your site by having a flawless Sitemap. To do this, analyze any Sitemap errors in Google Search Console and Bing Webmaster Tools, check which pages are indexed versus the URLs you have submitted and if there is a vast difference in this ratio or a sudden increase or decrease in these numbers, be sure to check your Sitemaps. They may reveal other problems on your site, such as problems with your robots.txt file, duplicate content and so on. There are many tools that can be used to import and crawl all of the pages referenced in your sitemaps (such as Screaming Frog), allowing you to easily spot any issues or unnecessary redirects.
The Sitemaps section allows you to monitor all of your Sitemaps from one place, providing a summary of the Sitemaps that have been submitted via the Google Search Console account. This includes a snapshot of data, including the Sitemap type, the dates they were most recently processed, any issues that have been identified and the number of pages submitted/indexed per Sitemap and overall.
It’s also possible to test or resubmit Sitemaps by clicking on the Sitemap that you wish to submit, then selecting the ‘Resubmit Sitemap’ or ‘Test Sitemap’ button.