Hugo

Please add limit the sitemap size

Will you be considering working on this anytime soon?

There’s a sitemap templates page - https://gohugo.io/templates/sitemap-template/ but it doesn’t have any info on creating sitemap indexes. (other than that for languages).

Google refuses to index sitemaps with links > 50k and it’s a huge inconvenience.

The range construct accepts an index variable. So you can use that to prevent Hugo from including any links past 50k.

In the example given in the docs just change the the range construct from {{ range .Pages }} to {{ range $i, $e := .Pages }}. This will give you access to the $i variable in your loop which will count up from 0 each loop. You can use that to make sure you never go past 50,000.

Something like this:

{{ range $i, $e := .Pages }}
    <!-- if $i is less than 50,000 -->
    {{ if lt $i 50000 }}
        <!-- stuff goes here -->
    {{ end }}
{{ end }}

This is how I’d implement a 50k limit. There may or may not be a better method. :man_shrugging:

How do you reach the limit? 50k seems like a very high number of links. I would attempt to create sitemaps per taxonomy or post type if it’s evenly distributed.

Other than that it would make much sense to enable a limit on amounts of links PER sitemap. I figure 50k lines will result in megabytes which won’t make search engines happy.

@joshmossas solution is fine, but you will loose all links beyond that (or have to manually create another template).

While thinking about it the best way without changing Hugo or adding features to Hugo might be a custom post type that you can page through (sitemap index) and that has all items (sitemaps) in subsequent files.

1 Like

Pages past 50k would be lost so it’s as good as not having a sitemap I guess.

This wouldn’t work in most cases as well.

Consider posts types Photography tagged under Abstract, etc. It’s ridiculously easy to reach the limit and I’m shocked this hasn’t been encountered by anyone else before.

:+1: I think that it would be great if Hugo managed sitemap and sitemap index files for sites with pages more than 50k. It might facilitate users maintaining websites with significant number of pages and may add another reason for them to use Hugo.

Echoing sentiments on this issue, this really needs to be addressed.

Despite what is or is not considered normal, it is very easy to find yourself generating a site with more than 50K pages - I run over a dozen such sites.

Google’s hard limit of 50K entries in a single sitemap index file is a problem. I’ve been paging through the Hugo code and looking to understand how an appropriate change could be made, but I just don’t have a grasp on the codebase yet.

I’ll contribute what I can if I can, but I would urge the developers to consider a change that produces a sitemap_index.xml by default and dynamically adds sitemap_XX.xml files as needed in batches of 49,999 entries.

This would be a much-needed addition and from searching this issue it would be very much appreciated by a lot of site developers.

Pain in the $$$ ? :wink:
break it down by sections

my templates from layouts/_default

home.sitemap.html

{{ print "<?xml version=\"1.0\" encoding=\"utf-8\" ?>"  | safeHTML }}
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">{{ range (where site.Sections "Section" "not in" site.Params.invisibleSections) }}
 <sitemap>
  <loc>{{ .Permalink }}sitemap.xml</loc>
 </sitemap>{{end}}
</sitemapindex>

section.sitemap.html

{{ print "<?xml version=\"1.0\" encoding=\"utf-8\" ?>"  | safeHTML }}
<urlset xmlns="http://wwwsitemaps.org/schemassitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">{{ range first site.Params.sitemapMax .CurrentSection.RegularPages }}
  <url>
    <loc>{{ .Permalink }}</loc>
    <lastmod>{{ safeHTML ( .Lastmod.Format site.Params.dateFormatFeed ) }}</lastmod>
  </url>{{ end }}
</urlset>

in the config

[Params]
    invisibleSections   = ["menu","intern","test","about"]
    sitemapMax          = 49999

[outputs]
    home                  = [ "HTML", "SITEMAP"]
    section               = [ "HTML", "SITEMAP" ]

[outputFormats.SITEMAP]
    MediaType             = "application/xml"
    BaseName              = "sitemap"
    suffix                = "xml"
    IsHTML                = false
    IsPlainText           = false
    noUgly                = true
    Rel                   = "alternate"

corrected spelling error

2 Likes

how does your code solve this, if I can hit limit with one section.

We need some way to generate files based on code.
Chunk pages by month (or even week) for example and output into separate files sitemap-2019W34.xml, sitemap-2019W35.xml, … smaller each sitemap, faster getting into google index. I have it tested with large WP site.

I don’t know if Paging could help here …

No. It puts 10 (configured) urls into sitemap.xml but generates only 1 sitemap.xml file. Tested few seconds ago.

This works, will try to connect string to actual data:

(`test` | resources.FromString `sitemaps/2019W01.xml`).Permalink