I recently updated my sitemap generator, and decided to look into how the changefreq element should be used. My old generator had it included, and I was wondering whether it was still relevant to include it.
The reason I was thinking about this is, logically, it just does not make much sense to include it, because you can not really predict how often a page is changed in an easy way. Of course you could calculate a change frequency, but the trouble of implementing it is just not worth it.
Besides, if a lastmod element is included in your sitemap files, then logically it should simply "override" the changefreq specification. If the last modified date changes, then Google, and presumably other search engines, should simply re-crawl that specific page.
- changefreq is optional and can be safely left out.
- It is better to include a timestamp using the lastmodelement, and the format should be: Y-m-d
- priority has no influence on crawl frequency.
- Leaving out changefreq and priority will save bandwidth.
Google Webmasters Hangout
Back in May, 2015, there was a Google Webmasters Hangout where this was briefly discussed. Someone asked the following question:
Does priority and frequency matter in a sitemap, if not, how can we tell Google bot to crawl specific pages on a daily and high priority?
First of all, having all of your pages crawled daily is not something you would want, as it would just cause unnecessary load on your server. Secondly, the priority and change frequency are abstract subjects. It is not immediately obvious how these are used by search engines. According to sitemap.org:
- priority can be given a value between 0.0 to 1.0, and is used to indicate a priority when search engines needs to make a selection between pages. It is not used to indicate crawl-priority.
- changefreq can be given a value of daily, weekly, monthly, and yearly. It represents how often a page is updated in order to make crawling easier for search engines when other signals are not present. If you include a lastmod element in the Y-m-d format, then you can leave out the change frequency element.
John Mueller's response to this question was:
Priority and change frequency doesn’t really play that much of a role with Sitemaps anymore.
So, this is something where we try various things, but, essentially if you have a sitemap file, and you are using it to tell us about the pages that changed and that were updated, it's much better to just specify the timestamp directly so that we can look into our internal systems and say: "oh we haven't crawled since this date, therefor we should crawl again."
And, just crawling daily doesn't make much sense if the content doesn't change. So, that's something where we see a lot of sites, they give us this information in a sitemap file, they say it changes daily or weekly. We look in our database of what we see when we crawl; we say: "well, this hasn't changed for months or years; why are they saying it's daily?"
Clearly, there is kinda like a disconnect here, where we should probably ignore something there.
John goes on to explain what he recommends:
So what I'd really recommend doing there is just using the timestamp, and making sure that the timestamp is always up to date, so that we can work with that timestamp and say: "well, we crawled on this date, you are saying it changed on this other date, therefor we should crawl again."
Including a timestamp from PHP
In PHP you can easily include a timestamp using the date function; the following assumes you are using unix timestamps in your database:
echo ' <lastmod>' . date('Y-m-d', $row['timestamp']) . '</lastmod>' . "\n";
Of course, if you are using a CMS like Wordpress, then this should be done automatically.
If you want to learn how you can generate a sitemap from PHP, you should read this tutorial:
- May 8, 2015: English Google Webmaster Central office-hours hangout - youtube.com
- The Sitemap Protocol - sitemaps.org