A sitemap.xml file is a tool that allows webmasters to inform search engines about the site’s pages available for indexing. Also, in the XML map, you can specify additional page parameters: the date of the last update, the frequency of updates, and the priority relative to other pages.
The information in sitemap.xml can affect the behavior of the search crawler and, in general, the process of indexing new documents.
Sitemap contains directives for including pages in the crawl queue and complements robots.txt, which contains directives for excluding pages.
In this guide, you will find answers to all questions regarding using sitemap.xml.
Do I need sitemap.xml?
Search engines use sitemaps to find new documents on the site (these can be html documents or media content) that are not accessible through navigation but need to be crawled.
Having a link to a document in a sitemap.xml does not guarantee that it will be crawled or indexed, but more often than not, the file helps larger sites get indexed better.
In addition, the data from the XML map is used in determining the canonical pages unless it is specifically specified in the rel=canonical tag.
Sitemap.xml is important for sites where:
- Some sections are not available through the navigation menu.
- There are many isolated pages or poorly connected pages.
- Technologies poorly supported by search engines (for example, Ajax, Flash or Silverlight) are used.
- There are a lot of pages present, and there is a chance that the search crawler will skip new content.
If these are not your cases, you most likely do not need sitemap.xml. For sites where each important page for indexing is available within 2 clicks, where JavaScript or Flash technologies are not used to display content, where canonical and regional tags are used if necessary, and fresh content appears no more often than a robot visits the site, in the file sitemap.xml is not needed.
For small projects, if the problem is only a large level of nesting of documents, it is easy to solve it using an HTML sitemap without resorting to using an XML sitemap. But if you decide that you still need sitemap.xml, then read this guide in full.
XML Sitemap Technical information
- Sitemap.xml is an XML text file. However, search engines also support text format (see the next section).
- Each Sitemap can contain a maximum of 50,000 addresses and weigh no more than 50Mb (10Mb for Yandex).
- You can use gzip compression to reduce the size of the sitemap.xml file and increase its transfer speed. In this case, use the gz extension (sitemap.xml.gz). At the same time, weight restrictions remain for uncompressed sitemaps.
- The Sitemap’s location determines the set of URLs that can include in this Sitemap. A map containing the addresses of the entire site’s pages should be located at the root. If the Sitemap is located in a folder, then all URLs in this Sitemap must locate in this folder or deeper (see details).
- The addresses in sitemap.xml must be absolute.
- The maximum URL length is 2048 (1024 characters for Yandex).
- Special characters in the URL (such as the ampersand “&” or quotes) must be escaped in the HTML entity.
- The pages specified in the map should return a 200 http status code.
- The addresses listed in the map should not be closed in the robots.txt file or meta-robots.
- The Sitemap must not be closed in robots.txt. Otherwise, the search engine will not crawl it. The file itself may be in the index.
sitemap .xml Formats
Search engines support a plain text sitemap format that lists page URLs without additional parameters. In this case, the file must be UTF-8 encoded and have a .txt extension.
Search engines also support the standard XML protocol. Google additionally supports sitemaps for images, videos, and news.
sitemap .xml Example
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="https://www.garachh.com/schemas/sitemap/0.9">
<url>
<loc>https://example.info/</loc>
<lastmod>2018-06-14</lastmod>
<changefreq>daily</changefreq>
<priority>0.9</priority>
</url>
</urlset>
XML
urlset tags (required) - Specifies the standard of the current protocol.
url (required) - The parent tag for each URL.
loc (required) - Document URL, must be absolute.
lastmod is the date the document was last modified in Datetime format .
changefreq - page change frequency (always, hourly, daily, weekly, monthly, yearly, never). The value of this tag is a recommendation to search engines, not a command.
priority - URL priority relative to other addresses (from 0 to 1) for the scanning order. If not specified, the default is 0.5.
XML Map for Images
Some optimizers insert links to images into sitemap.xml in the same way as links to html documents. You can do this, but Google should use an extension of the standard protocol and send additional information about the images along with the URLs.
Creating XML image maps is useful if images need to be crawled and indexed, and at the same time, they are not directly accessible to the bot (for example, JavaScript is used).
An example of a sitemap containing one page and images belonging to it
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://example.com/primer.html</loc>
<image:image>
<image:loc>http://example.com/kartinka.jpg</image:loc>
</image:image>
<image:image>
<image:loc>http://example.com/photo.jpg</image:loc>
<image:title>SEO Company</image:title>
<image:geo_location>New York,USA</image:geo_location>
<image:license>http://creativecommons.org/licenses/by-nd/3.0/legalcode</image:license>
</image:image>
</url>
</urlset>
image:image XML tags (required) - Information about a single image. A maximum of 1000 images can be used.
image:loc (required) is the path to the image file. If a CDN is used, it is acceptable to refer to another domain if it is confirmed in the webmaster panel.
image:caption — image caption (may contain long text).
image: title - the image's title (usually a short text).
image:geo_location - location of the photo.
image:license - Image license URL. Used for advanced image search.
Sitemap .xml for video
Like the map for images, Google also has a sitemap extension for the video protocol, where you can specify detailed information about video content that affects display in video search.
A video sitemap is needed when the site uses videos hosted locally and when indexing these videos is difficult due to the technology used. If you embed a YouTube video on your site, then video-sitemap is unnecessary.
Learn more about video sitemaps:
https://developers.google.com/webmasters/videosearch/sitemaps
News Sitemap
Like the map for images, Google also has a sitemap extension for the video protocol, where you can specify detailed information about video content that affects display in video search.
A video sitemap is needed when the site uses videos hosted locally and when indexing these videos is difficult due to the technology used. If you embed a YouTube video on your site, then video-sitemap is unnecessary.
If you have news content on your site and participate in Google News, it’s useful to use a News Sitemap so that Google can find your latest content faster and index all news articles. In this case, the Sitemap must contain only the addresses of pages published in the last 2 days and no more than 1000 URLs.
Learn more about News Sitemaps:
https://support.google.com/news/publisher-center/answer/74288
Using Multiple Cards
If necessary, you can combine several sitemaps into one index sitemap. Multiple sitemap.xml is used when:
· The site uses several engines (CMS).
· The site has over 50,000 pages.
· It is necessary to set up convenient error tracking in sections.
In the latter case, each large section of the site has its sitemap.xml, and all of them are added to the web admins panel, where it is convenient to see which section has the most errors (see the section on finding errors in the Sitemap).
If you have 2 or more sitemaps, they must be combined into an index sitemap, which looks the same as a regular one (except for the presence of sitemap index and sitemap tags instead of urlset and url), has similar restrictions and can only refer to regular XML maps (not index ones).
Example Sitemap Index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap-blog.xml.gz</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap-webinars.xml.gz</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>
sitemapindex (required) - Specifies the current protocol standard.
sitemap (required) - contains information about a particular sitemap.
loc (required) - location of the Sitemap (in xml, txt or rss format for Google).
lastmod - sitemap modification time. Allows search engines to discover new URLs on large sites quickly.
How to create sitemap.xml?
XML Sitemap Creation Methods:
Internal CMS tools. Many CMS already support sitemap creation. To find out, read the documentation for your CMS, look at the menu items in the admin panel, or contact the engine’s technical support. Upload the https://yoursite.com/sitemap.xml file on your site; it may already exist and be dynamically generated.
External plugins. If the CMS does not have sitemap generation functionality and supports plugins, google, which plugin closes the sitemap.xml issue for your engine and installs it. Sometimes, you need to contact programmers to write a similar plugin.
A separate script on the site. Knowing the XML map protocol and technical limitations, you can create sitemap.xml by adding the generation script to CRON. If you are not a programmer, use the other items on this list.
Sitemap generators. Many sitemap.xml generators will crawl your site and let you download the finished map. The disadvantage is that you must manually generate the Sitemap each time you update the site.
Parsers. Desktop programs designed for technical site analysis usually provide the ability to download sitemap.xml generated from crawled pages. It works similarly to sitemap generators but only runs locally on your machine.
Creating an XML Sitemap in WordPress Through Plugin
Now, WordPress has a default XML sitemap without any plugin; you can access:
Abc.com/wp-sitemap.xml.
Yoast SEO
Among other useful features for SEO, it allows you to generate sitemap.xml.
Google XML Sitemaps
A simple WordPress sitemap generator plugin.
WP Sitemap Page
Another WordPress plugin if the previous ones didn’t fit. Sitemap.xml should be updated as soon as new pages appear on the site. However, if pages appear on the site often and in batches, it is advisable to generate a Sitemap about once an hour.
Ensure that the Sitemap does not include duplicates, non-existent pages, or redirects. For example, pagination and sorting pages need not be included in sitemap.xml.
An ideal sitemap consists of pages of the main sections and subsections of the site and final nodes (articles, product cards, etc.).
Cyrillic addresses in Sitemap
Even though the sitemap protocol allows only ASCII characters in the URL, Google and Yandex support both formats for Cyrillic addresses, encoded and plain. Same with IDNs, you can use regular format and Punycode.
However, for the compatibility of your sitemap.xml with various search engines and services, it is recommended to follow the protocol, encode Cyrillic domains in Punycode format (use this converter), and use masking for Cyrillic page addresses.
Sitemap of a multilingual and multiregional site
If your site is multilingual or multiregional, then Google supports hreflang markup directly in sitemap.xml. To do this, you need to use additional xhtml: link tags in the map with hreflang.
Example. The site uses two languages: Russian and Ukrainian. In this case, the sitemap.xml for one of the pages will look like this.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>http://www.example.com/ru/</loc>
<xhtml:link
rel="alternate"
hreflang="ru"
href="http://www.example.com/ru/"
/>
<xhtml:link
rel="alternate"
hreflang="uk"
href="http://www.example.com/ua/"
/>
</url>
<url>
<loc>http://www.example.com/ua/</loc>
<xhtml:link
rel="alternate"
hreflang="ru"
href="http://www.example.com/ru/"
/>
<xhtml:link
rel="alternate"
hreflang="uk"
href="http://www.example.com/ua/"
/>
</url>
</urlset>
As you can see, each language/region URL must be presented in a separate url tag. The more languages on the site, the more this Sitemap will grow.
If the site has many subdomains, then each subdomain, as a separate site, must contain its sitemap.xml. This is one of the disadvantages of the sub-domain structure of the site.
Finding errors in the Sitemap
When creating an XML sitemap, web admins often make the following mistakes:
- The URL leads to a page whose http status code is not 200 (for example, the page does not exist or redirects to another page). It is necessary to leave only existing pages in sitemap.xml.
- The URL leads to a page blocked from indexing in the robots.txt file. Here you need to figure out if the error is in robots.txt or sitemap.xml.
- The URL leads to a page closed by the meta-robots noindex tag. Links in the Sitemap should lead only to pages that are available for indexing.
- Errors regarding restrictions or non-compliance with the standard protocol.
The easiest way to check the Sitemap is to use Screaming Frog in list mode (“Mode” – “List” menu). Upload the Sitemap, and the program will check all the URLs by itself; the reports will show which status codes are given and whether the addresses are closed from indexing.
You can also use the Yandex Sitemap Analyzer. This is a good place to test your maps before adding them to the webmaster panel. And after adding a map to the panel, search engines will report other errors after they crawl the URLs.
Recommended error-checking algorithm:
- Scan XML Sitemap with Screaming Frog, and get rid of all errors.
- Test the Sitemap through the Yandex tool or Google Search Console.
- Add a link to the map in robots.txt, the Yandex webmaster panel, and Google Search Console.
- Periodically monitor the section in the panel with the XML map.
Sitemap.xml when switching site to HTTPS
When a site switches to HTTPS, the main mirror changes, and you need to check that you comply with the following rules:
- The new sitemap.xml contains URLs prefixed with HTTPS.
- All old sitemaps have been removed from the webmaster panels and the robots.txt file of the old version of the site.
- Sitemap.xml on the http version of the site redirects with a 301 status code to the new sitemap.xml on the https version.
- The site has a page-by-page 301 redirect to the new version.
If this guide doesn’t answer your question, ask it in the comments.