What is an XML Sitemap?
An XML (Extensible Markup Language) Sitemap is a text file used to detail all URLs on a website. It can include extra information (metadata) on each URL, with details of when they were last updated, how important they are and whether there are any other versions of the URL created in other languages. All of this is done to help the search engines crawl your website more efficiently, allowing any changes to be fed to them directly, including when a new page is added or an old one removed.
There is no guarantee that an XML Sitemap will get your pages crawled and indexed by search engines, but having one certainly increases your chances, particularly if your navigation or general internal linking strategy doesn’t link to all of your pages.
<?xml version=”1.0” encoding=”UTF-8”> <urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:xhtml=”http:www.w3.org/1999/xhtml”> <url> <loc>https://www.example.com</loc> <lastmod>2017-10-06</lastmod> <changefreq>weekly</changefreq> <priority>0.9</priority> <xhtml:link rel=”alternate” hreflang=”en” href=”https://www.example.com”/> <xhtml:link rel=”alternate” hreflang=”fr” href=”https://www.example.com/fr”/> </url>
Glossary of Tags Used in a Sitemap
<urlset> - The Sitemap opens and closes with this tag. It is the current protocol standard.
<url> - This is the parent tag for each URL entry.
<loc> - This tag contains the absolute URL, or the locator of the page.
<lastmod> - This contains information about the file’s last modified date. It should be in YYYY-MM-DD format.
<changefreq> - This contains information about the frequency with which a file is changed.
<priority> - This indicates the file’s importance within the site. The value ranges from 0.0 to 1.0.
<xhtml:link> - In this case, this tag is used to provide details of alternate URLs offered in other languages.
The loc tag is compulsory, while the lastmod, changefreq and priority tags are optional.
Ideally, an XML Sitemap should be added to the root directory of the website. All URLs in the Sitemap must come from the same host.
Only the canonical version of all page URLs should be included, so pages should not redirect or return an error status.
The maximum length of the URLs is 2,048 characters.
While it may seem possible to manipulate search engines into thinking the content on your page is frequently updated by declaring the changefreq tag daily, it is not advisable to do so. If the frequency and priority tags do not reflect reality, chances are that search engine crawlers will ignore them.
All URLs in the Sitemap must come from the same host.
If you need help building your sitemap, there are several sitemap generator tools to help.
Sitemap Example Index File
When you use multiple Sitemap files for one website, each file must be listed collectively in a separate file called the Sitemap index file.
<sitemapindex xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”> <sitemap> <loc>https://www.example.com/sitemap1.gz</loc> <lastmod>2017-12-31</lastmod> <sitemap> <loc>https://www.example.com/sitemap2.gz</loc> <lastmod>2017-10-01</lastmod>
You might need multiple XML Sitemaps if you have more than 50,000 URLs within a site or if one Sitemap exceeds 10Mb. If this is the case, then you will be required to create another XML Sitemap. You have the option to reduce your bandwidth requirement by compressing the Sitemap file using gzip but you need to make sure that after decompressing the file, the size still does not exceed 10Mb.
You can see an example of a Sitemap Index File to the left.
Glossary of Tags Used in a Sitemap Index File
<sitemapindex> - The Sitemap index file opens and closes with this tag. It encloses all the XML Sitemaps in the file.
<sitemap> - This tag encloses information about individual Sitemaps.
NOTE: A Sitemap index file can link up to 50,000 XML Sitemap files.
What File Types are Supported by XML Sitemaps?
Google can detect the various file types of Sitemaps. These are specific to the type of content on a site and help search engines identify the files much more easily.
List of Sitemap Filetypes:
Do I need an XML Sitemap?
Is a sitemap strictly necessary? No, not technically. Your website will still work without one, and it can even be crawled and indexed by search engines. Plus, sitemaps aren’t used as a ranking signal, so submitting one won’t make you rank higher.
So why do it? The biggest reason you should create and submit your XML sitemap is indexing. Even though search engines can still technically find your pages without one, adding a sitemap makes it so much easier for them. You might have orphaned pages (pages that got left out of your internal linking), or that are harder to find. Your sitemap is especially important when you’ve recently added pages or created a whole new site that doesn’t have a lot of, or any, links to it yet.
Sitemaps also help search engines crawl your pages more intelligently. They take ‘
If you’ve got a geo-targeted international site, or a site that has the same page translated into multiple languages, you can use your XML sitemap to your advantage. As we showed in our example above, putting hreflang tags in your sitemap tells crawlers that you’ve got multiple versions of your page. Search engines can use this information to make sure they’re serving the right version to users based on language and/or location.
Tools to Generate XML Sitemaps
- Screaming Frog SEO Spider and Sitemap generator
- Enarion phpSitemapsNG
- Perl Sitemap Generator One
- Simple Sitemaps One
- Free Sitemap Generator One
CMS plugins for generating XML Sitemaps
- XML Sitemap – Drupal
- XML Sitemap – OS Commerce Three
- XML Sitemap – WordPressOne
- XML Sitemap – Joomla
Sitemap Schemas and Sitemap Validating Tools
XML schema (XSD) for Sitemaps 0.9 and supported Sitemap extensions give you the elements and attributes that need to be included in your XML Sitemaps. The schemas (depending on Sitemaps, Sitemap index files and different Sitemap supported file types) are as follows:
After creating your Sitemaps with all of the right elements and attributes in place, validate them using one of the following tools:
To test your Sitemap before you submit it in Google Search Console, click on the red Add/Test Sitemap button on the right, then enter the URL of the sitemap that you would like to test as shown in the screenshots below.
Test the validity of your sitemap, and then submit it to Google for crawling:
As of now, you can submit your sitemap using the new Google Search Console, but it's currently not possible to test it.
NOTE: This feature is also used to submit your Sitemap to Google. Another method you should use to tell search engines about your Sitemap is to add the following to your robots.txt file:
You can add this anywhere in the robots.txt file because the directive is independent of the user-agent line. You can also specify more than one Sitemap file per robots.txt file. Whenever you update your Sitemap, you can resubmit it to Google using the same Add/Test Sitemap option.
ADVANCED NOTE: You can also submit your Sitemap as an HTTP request. To do this you need to issue your request to the following URL:
Take a look at an example below:
URL encode the part after ping?sitemap=
Issue the HTTP request using wget, curl or any other method your web developer suggests.
Sitemaps for Images
If you have a site that uses a lot of images, it makes absolute sense to guide search engines to your image URLs by means of an image Sitemap.
Below is a sample of an image Sitemap.
<?xml version=”1.0” encoding=”UTF-8”> <urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:image=”http://www.google.com/schemas/sitemap-image/1.1”> <url> <loc>http://www.example.com/sample-page</loc> <image:image> <image:loc>http://www.example.com/image.jpg</image:loc> </image:image> <image:image> <image:loc:>http://www.exampe.com/image2.jpg</image:loc> </image:image> </url> </urlset>
A Glossary of Tags Used in an Image Sitemap
<image:image> - This tag encloses each image URL. You can list up to 1000 such tags.
<image:loc> - This contains the URL of the image.
<image:caption> - This tag contains a caption for the image. It is optional.
<image:geo_location> - You can specify a geographic location of the image in this tag. It is optional.
<image:title> - This contains a title for the image. It is optional.
<image:license> - This tag contains the URL pointing to the license of the image. It is optional.
It may not be possible to get your entire site indexed, even with the perfect Sitemap. It is possible, however, to discover the indexation issues of your site by having a flawless Sitemap. To do this, analyze any Sitemap errors in Google Search Console and Bing Webmaster Tools, check which pages are indexed versus the URLs you have submitted and if there is a vast difference in this ratio or a sudden increase or decrease in these numbers, be sure to check your Sitemaps. They may reveal other problems on your site, such as problems with your robots.txt file, duplicate content and so on. There are many tools that can be used to import and crawl all of the pages referenced in your sitemaps (such as Screaming Frog), allowing you to easily spot any issues or unnecessary redirects.
There’s a lot you can tell search engines about your page’s video resources in sitemaps:
<video:player_loc>- The URL pointing to the player for the video. If your video is embedded on your page, like from YouTube or Vimeo, you can use this tag instead of
<video:content_loc>. You can normally find this URL in the video’s embed code.
<video:duration>- The video’s length in minutes, between 0 and 28800 (8 hours). This isn’t technically required, but Google recommends it.
<video:expiration_date>- Only include this information if your video will not be available after a certain date. If you do use it, put dates in YYYY-MM-DD format, and times in Thh:mm:ss:TZD format.
<video:rating>- The video’s rating. Only values between 0.0 and 5.0 are valid.
<video:view_count>- The number of times the video has been watched.
<video:publication_date>- The date the video was first published, not the date you put it on your site.
No, your video will only appear in search results when the user disables SafeSearch. Otherwise, make this
<video:tag>- A very short description of key concepts related to your video. Create a separate
<video:tag>element for each tag you use, up to 32 tags.
<video:category- The broad subject your video covers, such as SEO, Digital Marketing or Advertising.
<video:restriction relationship=allow/deny>- A list of countries where the video cannot play, or a list of the only countries in which users can access the video, dependent on whether you set
deny. The list is space-delimited and uses the ISO 3166 country codes. If you don’t use this tag, it will be assumed that your video is available globally.
<video:gallery_loc>- The URL where you can find the collection in which your video appears, if there is one. Each video can have only one gallery_loc tag. If your gallery has a title you can add the
<video:price currency=" ”>- The price to download the video. The
currency=attribute is required and uses the ISO 4217 currency code. Add the optional
type=attribute to specify if the download is to own or rent, and
resolution=to specify if the video is in HD or SD. You can use this multiple times for each currency you accept.
<video:requires_subscription>- Allowed values are
noto indicate whether or not a subscription is required to watch the video.
<video:uploader>- If your video is embedded from another video site, put the name of the host here. This URL must be the same domain as the
<video:platform_relationship=allow/deny>- The platforms,
tv, where the video can or cannot be accessed. The
relationship=attribute defines whether the list is inclusive or exclusive. You can have only one platform tag per video.
<video:live>- Whether or not the video is a live stream. Only
XML Sitemap Size Limits
XML sitemaps are limited by size, both in number of URLs you can include and in file size. Sitemaps can only have 50,000 entries, with up to 1,000 images and a max size of 10MB. If you’ve got a really big site that has lots of pages, images and/or videos, you’ll need to create multiple sitemaps. If you encounter this, you’ll need to create a sitemap of sitemaps, known as a Sitemap Index File.
Sitemaps in Google Search Console
The Sitemaps section allows you to monitor all of your Sitemaps from one place, providing a summary of the Sitemaps that have been submitted via the Google Search Console account. This includes a snapshot of data, including the Sitemap type, the dates they were most recently processed, any issues that have been identified and the number of pages submitted/indexed per Sitemap and overall.
It’s also possible to test or resubmit Sitemaps by clicking on the Sitemap that you wish to submit, then selecting the ‘Resubmit Sitemap’ or ‘Test Sitemap’ button.
When done right, XML sitemaps are help search engines quickly find, crawl and index websites. Make sure you’ve properly formatted, compressed and submitted your XML sitemap to search engines to get the most of their advantages:
You no longer need to rely on linking to get your pages crawled.
Search engines will see new or updated sites and pages more quickly.
Bots can crawl pages more intelligently thanks to the meta information available in sitemaps.
You can make sure that search engines are finding important information about images and videos, which are inaccessible to crawlers.
Have you created and submitted an XML sitemap for your website? What benefits have you noticed? Did you encounter any challenges?