Can shuffle be cached?

I read a topic in this forum that shuffle can be a performance bottleneck for large sites when used with first or last due to the number of iterations. Can it be cached for a defined time period?

  • defined time period - no, except for when you are defining your time period by re-builds of your website. You could for instance use a CI job that recreates and deploys your site every midnight. Then cache a partial that calculates whatever you want to show and use a partialCached call to it.
  • used with first or last - this sounds like you want to do some sort of navigation per page. I would cache that in any case, you can call partialCached with multiple identificators in the end. for instance {{ partialCached "partial.html" $context page $type }} would cache the call to partial.html once for each page this call is put one, and then add the value of $type to create individual versions of that cache. Basically you have 10 pages, so it’s 10 cached runs of the partial if you have 1 $type, 20 if you have 2 $types, and so on.

So general answer is “yes, but it’s complicated”.

PS: I wonder what specific use case would require shuffling of first and last. Might be solvable on the frontend via Javascript instead of doing it “manually” (static) for each page.

I ran a quick test on a 10,000+ page site with this code in layouts/page.html:

{{ with site.RegularPages | shuffle | first 5 }}
  <p>Five random pages:</p>
  <ul>
    {{ range . }}
      <li><a href="{{ .RelPermalink }}">{{ .LinkTitle }}</a></li>
    {{ end }}
  </ul>
{{ end }}

Build time (average of 3 runs):

  • With the code above: 3.9 seconds
  • Without the code above: 3.2 seconds

That’s about a 22% increase in build time. But as soon as you add other things to the site (e.g., image processing) the difference as a percentage will decrease.

git clone --single-branch -b hugo-forum-topic-54636 https://github.com/jmooring/hugo-testing hugo-forum-topic-54636
cd hugo-forum-topic-54636
hugo 

This test site was originally created to test pagination with a pagerSize of 10.

1 Like

@jmooring @davidsneighbour I was referring to this topic Building a site with 50,000 pages taking a very long time. I have a 37k pages site I am making using content adapters and I was hesitant to use shuffle after reading Joe’s comments. (I want to show first 3-6 related pages for all 37k pages. The site already takes 1-2 minutes to build, so I did not want to push that up.).

That particular site had many… opportunities for improvement. I don’t remember the details (and I’m not going to review that thread), but there were several places where templates were ranging through every page on every page.

In your case I suggest running some timed experiments.

1 Like

This particular site is RAM hungry. But I will do just that after setting memory limit.

1 Like

First run without shuffle

Built in 67910 ms

First and 2nd runs with shuffle

Built in 164701 ms
Built in 174700 ms
{{ with site.RegularPages | shuffle | first 5 }}
  <p>Related content:</p>
  <ul>
    {{ range . }}
      <li><a href="{{ .RelPermalink }}">{{ .LinkTitle }}</a></li>
    {{ end }}
  </ul>
{{ end }}

Quite a jump (I set 2G memory limit. Is that too low on 8GB RAM? ). Site has no pagination or image processing. Just simple data tables generated from external data files.

So, I guess I will skip using shuffle on this site.

Setting HUGO_MEMORYLIMIT to 2 GB on an 8 GB machine is the same as not setting it at all… the default is 25%.

Having said that, I have no idea if increasing the memory limit would improve performance.

out of the bat.

site.Regular.Pages can be cached with a partial ( if that helps)

create 5 distinct random numbers with max last index in a slice.
range over the slice and access the page using index.

should be faster

1 Like

site.RegularPages is cached when you call it.

This site actually shoots all the way to 7 point something GB RAM without that setting and freezes my PC. Hence the limit, though I read from Bep it is a 'best effort` thing HUGO_MEMORYLIMIT is not taking effect? - #2 by bep. With the browser, IDE and terminal open, I saw that setting has improved the RAM usage.

@irkode’s suggestion to generate five random numbers is significantly faster:

<p>Five random pages:</p>
<ul>
  {{ range seq 5 }}
    {{ with math.Rand | mul site.RegularPages.Len | math.Floor | int }}
      {{ with index site.RegularPages . }}
        <li><a href="{{ .RelPermalink }}">{{ .LinkTitle }}</a></li>
      {{ end }}
    {{ end }}
  {{ end }}
</ul>

Using this approach with the example site referenced above, the build time increased by 10% instead of 22%.

2 Likes

I saw this Excluding current page when using 'shuffle` - #4 by jmooring when searching for topics with shuffle. Where can the complement (slice .) appear in the code order? I am assuming somewhere here {{ with math.Rand | mul site.RegularPages.Len | math.Floor | int }}?

untested, but that should do the trick, but keep min mind

  • that adds another O(n)
  • the chance to get the current page is 1:n
    if the difference between number of pages and needed pages is high the chance to get the current page may not be worth the runtime.

with 5 out of 10,000+ this will be 0,05%

<p>Five random pages:</p>
<ul>
  {{ $pages := site.RegularPages | complement (slice .) }}
  {{ $len := $pages.Len }}
  {{ range seq 5 }}
    {{ with math.Rand | mul $len | math.Floor | int }}
      {{ with index $pages . }}
        <li><a href="{{ .RelPermalink }}">{{ .LinkTitle }}</a></li>
      {{ end }}
    {{ end }}
  {{ end }}
</ul>

storing the $len in a variable is micro-optimization assuming that valueOf($len) is faster than valueOf($pages.Len), cause it saves one Method call

Didn’t think of that. Was thinking of a site I have with less than 1k pages and the likelihood of the current page appearing in the range.

It is! Impressive!

First Built in 76669 ms
Second Built in 62248 ms

i built up a test with just a sequence of numbers and not a real .RegularPages collection to check how the random performs:

did 2000 runs with various array sizes filtering 5 elements:

There are some duplicates regarding indexes or hitting current index. I had duplicates even with 50000 pages :wink:

2000 runs for getting 5 random numbers out of 100 Pages
  • created 201 lists with duplicate indexes. That is 10.05 percent
  • hit 85 times the current page (2) within the indexes. That is 4.25 percent
2000 runs for getting 5 random numbers out of 500 Pages
  • created 44 lists with duplicate indexes. That is 2.20 percent
  • hit 17 times the current page (280) within the indexes. That is 0.85 percent
2000 runs for getting 5 random numbers out of 1000 Pages
  • created 24 lists with duplicate indexes. That is 1.20 percent
  • hit 10 times the current page (889) within the indexes. That is 0.50 percent
2000 runs for getting 5 random numbers out of 2000 Pages
  • created 8 lists with duplicate indexes. That is 0.40 percent
  • hit 5 times the current page (1934) within the indexes. That is 0.25 percent
2000 runs for getting 5 random numbers out of 5000 Pages
  • created 1 lists with duplicate indexes. That is 0.05 percent
  • hit 2 times the current page (3886) within the indexes. That is 0.10 percent
2000 runs for getting 5 random numbers out of 10000 Pages
  • created 2 lists with duplicate indexes. That is 0.10 percent
  • hit 1 times the current page (3123) within the indexes. That is 0.05 percent
2000 runs for getting 5 random numbers out of 50000 Pages
  • created 0 lists with duplicate indexes. That is 0.00 percent
  • hit 0 times the current page (20048) within the indexes. That is 0.00 percent

timings bare math.Rand:

{{- $seq := slice -}}
{{- range seq $num -}}
  {{ $seq = $seq | append (math.Rand | mul $numPages | math.Floor | int -}}
{{- end -}}

INFO  timer:  name Pages_ 500 count 1 duration 59.3693ms average 59.3693ms median 59.3693ms
INFO  timer:  name Pages_10000 count 1 duration 61.489ms average 61.489ms median 61.489ms
INFO  timer:  name Pages_5000 count 1 duration 61.4975ms average 61.4975ms median 61.4975ms
INFO  timer:  name Pages_1000 count 1 duration 61.6449ms average 61.6449ms median 61.6449ms
INFO  timer:  name Pages_2000 count 1 duration 61.8377ms average 61.8377ms median 61.8377ms
INFO  timer:  name Pages_50000 count 1 duration 62.9562ms average 62.9562ms median 62.9562ms
INFO  timer:  name Pages_ 100 count 1 duration 75.2133ms average 75.2133ms median 75.2133ms

timings with partial

{{- $seq := slice -}}
{{- range seq $num -}}
  {{ $seq = $seq | append (partial "inline/getRandomNumber.html" (dict "numPages" $numPages "currentPage" $pageNum "seq" $seq)) }}
{{- end -}}

INFO  timer:  name Pages_ 500 count 1 duration 151.2201ms average 151.2201ms median 151.2201ms
INFO  timer:  name Pages_5000 count 1 duration 153.5982ms average 153.5982ms median 153.5982ms
INFO  timer:  name Pages_50000 count 1 duration 153.8246ms average 153.8246ms median 153.8246ms
INFO  timer:  name Pages_10000 count 1 duration 154.6755ms average 154.6755ms median 154.6755ms
INFO  timer:  name Pages_2000 count 1 duration 157.0459ms average 157.0459ms median 157.0459ms
INFO  timer:  name Pages_1000 count 1 duration 159.0901ms average 159.0901ms median 159.0901ms
INFO  timer:  name Pages_ 100 count 1 duration 171.7678ms average 171.7678ms median 171.7678ms

inline partial

{{- define "_partials/inline/getRandomNumber.html" -}}
   {{- $number := math.Rand | mul .numPages | math.Floor | int -}}
   {{- if or (in .seq $number) (eq $number .currentPage) -}}
      {{- $number = partial "inline/getRandomNumber.html" (dict "numPages" .numPages "currentPage" .currentPage "seq" .seq)}}
      {{- return $number -}}
   {{- end -}}
{{- end -}}

comments:

  • current page is simulated with a random index.
  • the check for current page has to be adjusted (untested)
    • pass the current Page to the partial (not an index)
    • inside the partial something like eq .currentPage (index .RegularPages $number)
  • may take a long time when selecting 5 out of 10 :wink:
  • for real page sets (large ones) it should still be faster
Complete test code
   {{- $numberOfPages := slice 100 500 1000 2000 5000 10000 50000}}
   {{- $runs := 2000 -}}
   {{- $num := 5 -}}
   {{- range $numPages := $numberOfPages -}}

      <h3>{{- $runs }} runs for getting {{ $num }} random numbers out of {{ $numPages }} Pages</h3>
      <ul>
         {{- $dups := 0 -}}
         {{- $dupCurrentPage := 0 -}}
         {{- $t := debug.Timer (printf "Pages_%4d" $numPages) -}}
         {{- $pageNum := math.Rand | mul $numPages | math.Floor | int -}}
         {{- range seq $runs -}}
            {{- $seq := slice -}}
            {{- range seq $num -}}
               {{ $seq = $seq | append (partial "inline/getRandomNumber.html" (dict "numPages" $numPages "currentPage" $pageNum "seq" $seq)) }}
            {{- end -}}
               {{- $seq = $seq | append (math.Rand | mul $numPages | math.Floor | int) -}}
            {{- if ne (len $seq) (len ($seq | uniq)) -}}
               {{/* <h4>{{- printf "LIST: run: %04d = %v" . $seq -}}</h4> */}}
               {{- $dups = add $dups 1 -}}
            {{- end -}}
            {{- if in $seq $pageNum -}}
               {{- $dupCurrentPage = add $dupCurrentPage 1 -}}
               {{/* <h4>{{- printf "PAGE: %3d run: %04d = %v" $pageNum . $seq -}}</h4> */}}
            {{- end -}}
         {{- end -}}
         {{- $t.Stop -}}
         <li>
            {{- printf "created %d lists with duplicate indexes. That is %3.2f percent" $dups (mul (div (mul $dups 1.0) $runs) 100.0) -}}
         </li>
         <li>
            {{- printf "hit %d times the current page (%d) within the indexes. That is %3.2f percent" $dupCurrentPage $pageNum (mul (div (mul $dupCurrentPage 1.0) $runs) 100.0) -}}
         </li>
      </ul>
   {{- end -}}
1 Like

I recognize that I am kicking a dead horse here, but…

With Hugo v0.123.0 and later you can range over an integer. That means you can range over a very big number: something less than or equal to 9223372036854775807. This allows us to simulate a loop that continues indefinitely until a break condition is met.

With my test site of approximately 10,000 pages, the code below is about 3x faster than using the shuffle function. The random pages are unique and exclude the current page.

layouts/page.html
{{ define "main" }}
  <h1>{{ .Title }}</h1>

  {{/*
  Create the page collection from which unique random pages will be selected
  for listing. Using the partialCached function as shown below improves
  performance with custom page collections, but is not required when using
  built-in page collections such as .Pages and .RegularPages. The built-in page
  collections are automatically cached upon their first use.
  */}}
  {{ $pc := partialCached "create-page-collection.html" . }}

  {{/* Set the number of unique random pages to be listed. */}}
  {{ $n := 5 }}

  {{/* Loop until we have listed the desired number of unique random pages. */}}
  <p>Random pages:</p>
  <ul>
    {{ $randomPages := slice }}
    {{ range 9223372036854775807 }}
      {{ $p := index $pc (math.Rand | mul $pc.Len | math.Floor | int) }}
      {{ if or (in $randomPages $p) (eq $p $) }}
        {{ continue }}
      {{ end }}
      <li><a href="{{ $p.RelPermalink }}">{{ $p.LinkTitle }}</a></li>
      {{ $randomPages = $randomPages | append $p }}
      {{ if eq (len $randomPages) $n }}
        {{ break }}
      {{ end }}
    {{ end }}
  </ul>

  {{ .Content }}
{{ end }}

{{ define "_partials/create-page-collection.html" }}
  {{ return site.RegularPages }}
{{ end }}

In the next release you will be able to use the math.MaxInt64 function instead of the literal value 9223372036854775807.

For those who are interested, in English the number above is nine quintillion, two hundred twenty-three quadrillion, three hundred seventy-two trillion, thirty-six billion, eight hundred fifty-four million, seven hundred seventy-five thousand, eight hundred and seven.

2 Likes

real cool stuff - allows ranging for more than the 2000 limit of seq without recursion.

adding that to my bare sequence measures the loops

  • NO CHECKS: INFO timer: name Pages_50000 count 1 duration 62.9562ms average 62.9562ms median 62.9562ms
  • RECUSRION: INFO timer: name Pages_50000 count 1 duration 153.8246ms average 153.8246ms median 153.8246ms
  • RANGE INT: INFO timer: name Pages_50000 count 1 duration 115.8306ms average 115.8306ms median 115.8306ms

additional:

  • maximum tries to get 5 out of 100+ always was below 10
  • for 5 out of 10 around 20 tries
  • and even these succeeded
    • 90 out of 100 : with 340 max tries and 4 seconds runtime
    • 900 out of 1000 : with max tries and 2.2 minutes

I know life’s a b*tch and Murphy a devil :smiling_face_with_horns:

the ones with paranoia should add a test after the loop or should check that $num and $numPages have proper distance


another two days to drill the horse and speed that up to 5% faster :wink:

slighly faster with two loops and no length check
{{ $randomPages := slice }}
{{ range $n }} {{/* loop over wanted pages */}}
  {{ range 9223372036854775807 }}
    {{ $p := index $pc (math.Rand | mul $pc.Len | math.Floor | int) }}
    {{ if in $randomPages $p }}
      {{ continue }}
    {{ end }}
    <li><a href="{{ $p.RelPermalink }}">{{ $p.LinkTitle }}</a></li>
    {{ $randomPages = $randomPages | append $p }}
    {{ break }} {{/* one found page handled, no condition */}}
  {{ end }} 
{{ end }}

INFO  timer:  name 03_10000_RANDOM_BREAK_WITH_COND count 10000 duration 463.4363ms average 46.343┬╡s median 0s
INFO  timer:  name 04_10000_RANDOM_BREAK_WITH_LOOP count 10000 duration 430.5013ms average 43.05┬╡s median 0s

With your code.

First run

$ hugo
Start building sites … 
hugo v0.147.0-7d0039b86ddd6397816cc3383cb0cfa481b15f32+extended linux/amd64 BuildDate=2025-04-25T15:26:28Z VendorInfo=gohugoio


                   |  EN    
-------------------+--------
  Pages            | 36938  
  Paginator pages  |     0  
  Non-page files   |     5  
  Static files     |     8  
  Processed images |    10  
  Aliases          |     0  
  Cleaned          |     0  

Total in 276297 ms

Second run

$ hugo
Start building sites … 
hugo v0.147.0-7d0039b86ddd6397816cc3383cb0cfa481b15f32+extended linux/amd64 BuildDate=2025-04-25T15:26:28Z VendorInfo=gohugoio


                   |  EN    
-------------------+--------
  Pages            | 36938  
  Paginator pages  |     0  
  Non-page files   |     5  
  Static files     |     8  
  Processed images |    10  
  Aliases          |     0  
  Cleaned          |     0  

Total in 251622 ms

Quite slow. Compare that with the code in Can shuffle be cached? - #12 by jmooring

First run

$ hugo
Start building sites … 
hugo v0.147.0-7d0039b86ddd6397816cc3383cb0cfa481b15f32+extended linux/amd64 BuildDate=2025-04-25T15:26:28Z VendorInfo=gohugoio


                   |  EN    
-------------------+--------
  Pages            | 36938  
  Paginator pages  |     0  
  Non-page files   |     5  
  Static files     |     8  
  Processed images |    10  
  Aliases          |     0  
  Cleaned          |     0  

Total in 49565 ms

Second run

$ hugo
Start building sites … 
hugo v0.147.0-7d0039b86ddd6397816cc3383cb0cfa481b15f32+extended linux/amd64 BuildDate=2025-04-25T15:26:28Z VendorInfo=gohugoio


                   |  EN    
-------------------+--------
  Pages            | 36938  
  Paginator pages  |     0  
  Non-page files   |     5  
  Static files     |     8  
  Processed images |    10  
  Aliases          |     0  
  Cleaned          |     0  

Total in 49360 ms

In my test as you can see, shuffle was 5.6x faster on the first run and 5x faster on the second run.

Please do this and post the console log:

git clone --single-branch -b hugo-forum-topic-54636 https://github.com/jmooring/hugo-testing hugo-forum-topic-54636
cd hugo-forum-topic-54636
hugo --logLevel info