What is an XML Sitemap Index?
If your XML sitemap exceeds the limitations set by search engines, such as the maximum number of URLs or file size, you must split it into multiple smaller XML sitemaps. These individual sitemaps can then be combined and referenced in an XML sitemap index, a separate XML file that serves as a directory for the individual sitemaps. This allows search engines to efficiently crawl and index the content of your website by following the references in the sitemap index to the different types of sitemaps for URLs, videos, images, categories, news, taxonomies etc
Here is an example of an XML sitemap index:
Sitemap Index XML File Example
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml.gz</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>
In this example, the elements of XML sitemap index uses the “http://www.sitemaps.org/schemas/sitemap/0.9” namespace, which is the standard namespace for sitemaps. The <sitemap> elements within the sitemap index specify the location and last modification date of the individual sitemaps (sitemap1.xml.gz and sitemap2.xml.gz) using the <loc> and <lastmod> elements, respectively. The sitemap index serves as a central reference point for search engines to discover and access the individual sitemaps, which may be split into smaller sitemaps to comply with search engine limitations.
Let’s analyze this file further!
XML Header
<?xml version="1.0" encoding="UTF-8"?>
Just like in the XML sitemap file, this XML header defines the version of XML (in this case, version 1.0) and the character encoding (UTF-8). This is a standard practice in XML files to specify the format and encoding of the file.
Sitemap Index – Definition
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
In contrast to the previous example of an XML sitemap file, here we see a sitemapindex definition instead of a urlset definition. The sitemapindex definition encapsulates all the individual sitemaps contained in the sitemap index and specifies the version of the XML Sitemap standard being used. Similar to the urlset definition, the sitemapindex definition is closed at the end of the document:
</sitemapindex>
Definition of the individual sitemaps
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
Next, we have the actual definition of the individual sitemaps within the sitemap index. Similar to the URLs in a sitemap, each sitemap definition must contain at least a <loc> tag, which specifies the full URL of the individual XML sitemap.
Additionally, the sitemap definition may optionally include a <lastmod> tag, which indicates the date when the referenced XML sitemap was last updated. This provides information to search engines about the freshness of the sitemap.
XML Sitemap Location
An XML sitemap is a file that provides search engines with information about your website’s pages and content. It helps search engines crawl and indexes your website more effectively. The XML sitemap should be placed in the root directory of your website, which is the main folder that contains all the files and folders related to your website.
The root directory is the top-level directory of your website’s file structure. Depending on your web hosting provider and server configuration, it is usually named public_html, htdocs, or www. Placing the XML sitemap in the root directory ensures that search engines can easily find and access it when they crawl your website.
Once you have created your XML sitemap, you can upload it to the root directory of your website using FTP (File Transfer Protocol) or through the file manager provided by your web hosting control panel. After uploading the XML sitemap, you should also submit it to search engines, such as Google Search Console, Bing Webmaster Tools, and others, to inform them of its presence and to help them better understand your website’s structure and content.
Similar to the pages on your website, the XML sitemap also has its URL. The standard convention for the URL of an XML sitemap is /sitemap.xml, which is recommended to make it easily discoverable by search engines. However, if you cannot use this default location for any reason, you can choose a different location or filename. In such cases, it’s important to reference the XML sitemap in your website’s robots.txt file using the Sitemap directive. For example, you can specify the alternative location and filename like this:
Sitemap:
http://www.xyz.com/alternativelocation/alternativefilename.xml
XML Sitemap Index Text File Location
Like XML sitemaps, there is a convention for the location and filename of the XML Sitemap Index, usually /sitemap_index.xml. However, you have the flexibility to choose a different location or filename as long as you reference it in your robots.txt file using the Sitemap directive:
Sitemap:
http://www.example.com/alternativelocation/alternativefilename.xml'