Building a site with 50,000 pages taking a very long time

Fortunately, the site I built since it fetches remote content, I no longer need to build it locally. Once I finished developing it, I let Netlify now handle the build (takes about 10 minutes on netlify) and I just set a cron job to rebuild the site weekly in case the remote content is updated. So, I just persevered during the development phase. The repo itself is small, since the remote content is dumped into the public folder during the build process.

Thank you very much @jmooring and @Arif

I will continue to optimize and bisect etc. and come back if I should have more questions.

Cheers.

Is this different from caching using nginx? And Cloudflare? I think my nginx config specified asset cache time to a few weeks, pretty decent.

Please advise thank you.

Yes. This is for Hugo’s cache. It is also recommended to change the cache folder from the default, otherwise the OS sometimes deletes files from it and you get weird errors.
Note that this is primarily for development builds on your own machine. If you are building on Netlify then you want to use their capabilities of storing the cache for re-use on future builds.

1 Like

That’s part of why Hugo does caching. Also, with a large site, you might want to use --renderToDisk to reduce memory usage.

I’m sorry I do not understand this. Doesn’t caching matter if I’m processing images using Hugo? But if all the site images are static/ assets anyway, what exactly am I caching?

What is Hugo cache? I don’t understand. The documentation is confusing to me. It just says here you can use these config directives to configure this, but why? Why am I configuring this? What is this cache? Which images and I caching? What is hitting this cache? Other people? Or the Hugo binary itself trying to build pages? I’m very confused. I don’t know if it’s just me but almost all of hugo’s documentation is just very basic and it does not explain so I’m left confused.

All my images are in /static/img/ and I don’t need to process anything. What exactly does this option do for me?

All Hugo does is copy the static images to /public/static/img/ upon build. Does this help there? I’m very confused. I’m sorry for my confusion.

I think I got confused by this so since I don’t even understand how this helps me I’m not going to worry about it for now. Hopefully someone can explain me this. Thanks.

After latest optimizations

hugo v0.110.0-e32a493b7826d02763c3b79623952e625402b168+extended linux/amd64 BuildDate=2023-01-17T12:16:09Z VendorInfo=snap:0.110.0

Template Metrics:

     cumulative       average       maximum      cache  percent  cached  total  
       duration      duration      duration  potential   cached   count  count  template
     ----------      --------      --------  ---------  -------  ------  -----  --------
  5m19.491902607s    6.602027ms  110.622927ms          0        0       0  48393  news/single.html
  1m59.163964534s    2.312695ms  399.957494ms          8        0      98  51526  partials/head.html
  1m47.285901688s   62.703624ms  182.422733ms          0        0       0   1711  _default/list.html
  1m19.283348092s   98.244545ms  243.048039ms          0        0       0    807  news/list.html
  54.057511148s    1.049334ms   21.980787ms        100        6    2967  51516  partials/tagcloud.html
  50.895329916s  334.837696ms  498.511623ms          0        0       0    152  drugs/single.html
   37.01040822s   210.28641ms  360.690111ms          0        0       0    176  shortcodes/related.html
  27.388632245s       563.9µs   88.738966ms          0        0       0  48570  shortcodes/signup.html
  24.965139491s     484.487µs   96.968558ms         10      100   51360  51529  partials/nav.html
  21.306507681s     413.501µs  209.436694ms        100      100   51360  51527  partials/footer.html
  16.938171652s     347.556µs  560.109622ms         99      100   48729  48735  partials/related.html
  12.727613426s     260.368µs   73.604055ms         99      100   48729  48883  partials/most_read.html
  10.636053175s     206.417µs  489.240841ms        100       96   49545  51527  partials/sidebar.html
   7.837280066s   23.256023ms  599.141524ms          0        0       0    337  _default/single.html
   5.149208671s  572.134296ms  617.439095ms          0        0       0      9  page/single.html
   4.883788163s    1.854133ms  112.351815ms         96        0       0   2634  partials/carousel_featured.html
   4.876897883s      94.669µs   74.496041ms        100      100   51359  51515  partials/author.html
   4.188902435s    2.103918ms  252.242835ms         99       99    1981   1991  partials/widgets/random.html
   3.715217067s      46.148µs    5.096749ms         92        0       8  80506  partials/share.html
   3.650144486s       70.84µs  134.542603ms         99      100   51360  51526  partials/scripts.html
   3.251109842s   16.587295ms  858.959145ms          0        0       0    196  _internal/_default/rss.xml
   2.886285564s    1.450394ms  212.866133ms         99      100    1981   1990  partials/widgets/faq.html
   2.319361532s   23.427894ms  173.642917ms          0        0       0     99  section/community.html
   1.652399224s  236.057032ms  395.492231ms          0        0       0      7  shortcodes/list_tags.html
   1.319044069s     663.169µs   100.84409ms         99      100    1981   1989  partials/widgets/featured.html
   1.214918322s      23.623µs     7.05178ms         23        0       0  51428  partials/seo/twitter.html
   623.309693ms  623.309693ms  623.309693ms          0        0       0      1  _default/index.json
    515.85314ms   515.85314ms   515.85314ms          0        0       0      1  _internal/_default/sitemap.xml
   506.798851ms  506.798851ms  506.798851ms          0        0       0      1  index.html
   487.134189ms       9.454µs    5.044789ms         34        3    1710  51524  partials/breadcrumbs.html
   383.424982ms     655.427µs  156.311534ms         71       66     384    585  partials/recent.base.html
   325.677579ms  108.559193ms  128.889439ms          0        0       0      3  section/drugs.html
   242.124595ms   60.531148ms  204.996643ms          0        0       0      4  section/blog.html
    234.20284ms   234.20284ms   234.20284ms        100        0       0      1  partials/sections_list.html
   204.271291ms     103.063µs  156.358013ms        100      100    1981   1982  partials/widgets/search.html
   140.291665ms      70.569µs   28.928898ms         99      100    1981   1988  partials/widgets/recent.html
   134.633083ms   14.959231ms   25.873151ms          0        0       0      9  section/psychedelics.html
   111.279157ms      42.392µs   63.379638ms        100      100    2621   2625  partials/drugs.html
    69.949385ms   69.949385ms   69.949385ms        100        0       0      1  partials/recent_blog.html
    68.199728ms   68.199728ms   68.199728ms        100        0       0      1  partials/recent_drugs.html
    63.199894ms   63.199894ms   63.199894ms        100        0       0      1  partials/recent_faq.html
    61.982444ms   61.982444ms   61.982444ms        100        0       0      1  partials/recent_psychedelics.html
    42.319858ms       93.01µs    1.386843ms          0        0       0    455  shortcodes/notice.html
    37.433741ms         770ns    7.627179ms        100        0       0  48570  partials/mailchimp.html
    37.259092ms      18.798µs   15.929361ms        100      100    1981   1982  partials/widgets/tags.html
    29.926026ms         618ns     284.524µs          0        0       0  48393  shortcodes/amazon_native.html
    21.993794ms      77.992µs     1.23385ms          0        0       0    282  shortcodes/imgcap.html
    14.702912ms       7.418µs    2.491065ms        100      100    1981   1982  partials/widgets/facebook.html
    13.195959ms      31.493µs      945.87µs          0        0       0    419  _internal/alias.html
    12.251493ms      73.804µs      407.67µs          9        0       0    166  partials/seo/schema.html
    11.272121ms       5.687µs      1.6885ms        100      100    1981   1982  partials/widgets/categories.html
     9.866837ms      59.438µs    3.862468ms        100        0       0    166  partials/seo/google_analytics.html
     4.077582ms     163.103µs     882.508µs          0        0       0     25  shortcodes/blockquote.html
     1.266844ms    1.266844ms    1.266844ms          0        0       0      1  404.html
       524.82µs       3.142µs     140.549µs        100        0       0    167  partials/site_social.html
      520.585µs     520.585µs     520.585µs        100        0       0      1  partials/testimonials.html
      475.419µs     475.419µs     475.419µs        100        0       0      1  partials/features.html
      375.648µs      93.912µs      183.14µs          0        0       0      4  shortcodes/radio.html
      375.228µs     375.228µs     375.228µs        100        0       0      1  partials/clients.html
      350.931µs     350.931µs     350.931µs        100        0       0      1  partials/team.html
      241.512µs     241.512µs     241.512µs          0        0       0      1  _internal/shortcodes/youtube.html
      207.987µs     207.987µs     207.987µs        100        0       0      1  partials/hero.html
      201.846µs     201.846µs     201.846µs        100        0       0      1  partials/signup.html
      199.412µs     199.412µs     199.412µs          0        0       0      1  shortcodes/aboutimg.html
      186.116µs     186.116µs     186.116µs        100        0       0      1  partials/welcome.html
      112.375µs     112.375µs     112.375µs          0        0       0      1  shortcodes/calc.html
        3.898µs       3.898µs       3.898µs          0        0       0      1  robots.txt


                   |  EN    
-------------------+--------
  Pages            | 49286  
  Paginator pages  |  2438  
  Non-page files   |    28  
  Static files     |  2451  
  Processed images |     0  
  Aliases          |   419  
  Sitemaps         |     1  
  Cleaned          |     0  

Total in 133091 ms
$  hugo -b="https://www.psychedelicsdaily.com/" --minify
Start building sites … 
hugo v0.110.0-e32a493b7826d02763c3b79623952e625402b168+extended linux/amd64 BuildDate=2023-01-17T12:16:09Z VendorInfo=snap:0.110.0

                   |  EN    
-------------------+--------
  Pages            | 49286  
  Paginator pages  |  2438  
  Non-page files   |    28  
  Static files     |  2451  
  Processed images |     0  
  Aliases          |   419  
  Sitemaps         |     1  
  Cleaned          |     0  

Total in 120429 ms

Are you using the built-in paginator, or did you roll your own?

Built in, using Paginator.Paginate.

I didn’t want to trouble you needlessly hence no reply yet to the git repository. Should I need help and need to bother you for your time, I’ll add you and reach out then. Thanks for that! :slight_smile:

{{ $paginator := .Paginate (where .Data.Pages "Type" "in" .Site.Params.mainSections) }}
{{ range $i, $e := where ($paginator.Pages.ByLastmod) ".Params.hidden" "!=" true }}
<ul class="pager">
    {{ if .Paginator.HasPrev }}
    <li class="previous"><a href="{{ .Site.BaseURL }}{{ .Paginator.Prev.URL }}">&larr; Newer</a></li>
    {{ else }}
    <li class="previous disabled"><a href="#">&larr; Newer</a></li>
    {{ end }}

    {{ if .Paginator.HasNext }}
    <li class="next"><a href="{{ .Site.BaseURL }}{{ .Paginator.Next.URL }}">Older &rarr;</a></li>
    {{ else }}
    <li class="next disabled"><a href="#">Older &rarr;</a></li>
    {{ end }}
</ul>

In case it matters, here’s how I generate my markdown content:

    $articles = R::findAll('article', ' WHERE summary !=""');

    foreach ($articles as $article){
        $summary = gzdecode(base64_decode($article->summary));
        echo "PROCESSING: {$article->link}" . PHP_EOL;

        $date_dir = convertDateFormat($article->published); // m-d-y, also for URL
        $date_hugo = $article->date;
        $year = convertDateFormatY($article->published);
        $month = convertDateFormatM($article->published);
        $day = convertDateFormatD($article->published);
        $title = (html_entity_decode(strip_tags($article->title)));
        $desc =(html_entity_decode(strip_tags($article->description)));

        $title = str_replace('"', "'", $title);
        $desc = str_replace('"', "'", $desc);

        $link = $article->link;
        $banner = "https://news.psychedelicsdaily.com/cache/thumb/".$article->image_file;
        $image = $article->image_link;

        $titleSlug = @slugify($title);
        $titleSlugT = @slugifyT($title);
        $descSlug = @slugifyT($desc);
        $revcount = rand(287,2923234);


        $frontmatter = <<<EOT
+++
author = "hash"
authors = "hash-borgir"
banner = "$banner"
date = "$date_hugo"
description = "$descSlug"
featured = false
images = ["$banner"]
reviewCount = "$revcount"
title = "$titleSlugT"
tags = ["{$article->category}"]
news = true
+++

# $title

$desc

![$title]($banner)

$summary


# Source

Read the rest of the article [HERE]($link)

Article Source: [$title]($link)

Article Curated: $date_hugo


{{< signup >}}

{{< amazon_native >}}


EOT;


        mkdir("/home/stoned/websites/websites/psychedelicsdaily.com-hugo/content/news/{$article->category}/$year/$month/$day/", 0755, 1);
        file_put_contents("/home/stoned/websites/websites/psychedelicsdaily.com-hugo/content/news/{$article->category}/$year/$month/$day/$titleSlug.md", $frontmatter);
        echo "CREATED:  /content/news/{$article->category}/$year/$month/$day/$titleSlug.md" . PHP_EOL;


        $article->converted=0                                                                          ;
        R::store($article);
        unset($article);
        $i++;
    }

So they’re very simple small markdown files, no more than 2-4KB each.

Sorry for the confusion.

  [caches.assets]
    dir = ':resourceDir/_gen'
    maxAge = -1

This caches anything you process from Hugo’s assets directory (that is, any global resources) like CSS or JavaScript minification. It also includes files that you simply access as a resource and use .Permalink or .RelPermalink (for example in a render hook).

  [caches.getcsv]
    dir = ':cacheDir/:project'
    maxAge = -1
  [caches.getjson]
    dir = ':cacheDir/:project'
    maxAge = -1
  [caches.getresource]
    dir = ':cacheDir/:project'
    maxAge = -1

These are for caching remote resources.

  [caches.images]
    dir = ':resourceDir/_gen'
    maxAge = -1

These are for processed images, and does not apply to files from static (AFAIK).

  [caches.modules]
    dir = ':cacheDir/modules'
    maxAge = -1

And finally any Hugo modules you use are cached according to the above configuration.


--renderToDisk changes the behaviour of hugo serve from serving the processed web pages from memory (including images simply copied from static) to storing the files in public and serving the site from there. (However, I think Hugo only serves files it has processed, not just anything that happens to be in public, but I have not verified that).

Also, if public has content then the production build (e.g. hugo without ‘serve’) doesn’t replace already existing files, and likely --renderToDisk behaves the same way.


I also suspect (but have not verified) that partialCached may go into Hugo’s cache.


I’d suggest, to see for yourself what caching achieves, or does not, to run hugo serve --renderToDisk, then quit it (Ctrl-C), and the run it again, and notice the difference in how long it takes.

Hugo will report the same number of Pages etc as processed, but that just means there are that many items in the site, not that they have been regenerated from scratch with the new run.

HTH

1 Like

–renderToDisk changes the behaviour of hugo serve from serving the processed web pages from memory (including images simply copied from static) to storing the files in public and serving the site from there. (However, I think Hugo only serves files it has processed, not just anything that happens to be in public, but I have not verified that).

Very useful to know. Thank you.

Hi @jmooring If I change the logic to this?

  {{ range first 24 .Site.RegularPages.ByDate | shuffle }}
  {{ if or (ne .Section "news") (ne .Section "community") (ne .Section "faq") (ne .Section "")}}

So if we just range over all the pages and don’t do a where comparison, then just display only the types we want, we should be able to avoid 50k times 50k comparisons, right?

50,000 pages * 24 iterations/page = 1.2 million iterations

I’m slowly starting to understand more. Thank you for your patience and help.

2 Likes

With a lot of caching, I was able to get it down to under 2 minutes.

This involved caching anything not dynamic using partialCached, reworking some of my logic, and other suggestions from the community.

Here are the results.

stoned@stoned-desktop - ~/websites/websites/psychedelicsdaily.com-hugo - [master !x?+]-
$  hugo -b="https://www.psychedelicsdaily.com/" --minify --templateMetrics --templateMetricsHints
Start building sites … 
hugo v0.110.0-e32a493b7826d02763c3b79623952e625402b168+extended linux/amd64 BuildDate=2023-01-17T12:16:09Z VendorInfo=snap:0.110.0

Template Metrics:

     cumulative       average       maximum      cache  percent  cached  total  
       duration      duration      duration  potential   cached   count  count  template
     ----------      --------      --------  ---------  -------  ------  -----  --------
  6m24.901532279s    7.953661ms  396.664238ms          0        0       0  48393  news/single.html
  1m56.041687258s   67.820974ms  248.368561ms          0        0       0   1711  _default/list.html
  1m44.682416389s    2.031603ms  456.089584ms          8        0      98  51527  partials/head.html
  1m1.661052594s   76.407747ms  356.404515ms          0        0       0    807  news/list.html
  50.418529265s    1.034332ms  539.864568ms          9       99   48392  48745  partials/related.html
   42.33862164s     821.646µs   57.772816ms         10      100   51512  51529  partials/nav.html
  40.184359771s  228.320225ms  442.425788ms          0        0       0    176  shortcodes/related.html
    39.2555148s  258.259965ms   457.03414ms          0        0       0    152  drugs/single.html
  32.475139232s     630.401µs   60.486965ms        100      100   51511  51515  partials/author.html
  23.414977767s     454.518µs   59.953555ms        100      100   51512  51516  partials/tagcloud.html
  18.254925775s     373.448µs   59.508986ms        100      100   48881  48882  partials/most_read.html
  15.552875254s     301.851µs  116.279414ms         99       96   49360  51525  partials/sidebar.html
  14.159860117s     274.788µs  163.947619ms        100      100   51512  51530  partials/footer.html
  13.983172055s      271.38µs   52.089038ms         10      100   51512  51526  partials/scripts.html
   12.01678235s   35.658107ms  642.121459ms          0        0       0    337  _default/single.html
   5.617931343s    2.132851ms    89.64602ms         96        0       0   2634  partials/carousel_featured.html
   3.723442463s       46.25µs    7.998183ms         92        0       8  80506  partials/share.html
    3.38668902s   17.279025ms  895.900859ms          0        0       0    196  _internal/_default/rss.xml
   2.190678859s  243.408762ms  307.257025ms          0        0       0      9  page/single.html
   1.928774388s  275.539198ms  399.127138ms          0        0       0      7  shortcodes/list_tags.html
   1.898169831s     873.525µs  111.186176ms        100      100    2164   2173  partials/widgets/featured.html
   1.307119242s   13.203224ms   35.659694ms          0        0       0     99  section/community.html
   1.197285014s       23.28µs    6.629506ms         18        0       0  51429  partials/seo/twitter.html
   796.568149ms        16.4µs   11.967042ms          0        0       0  48570  shortcodes/signup.html
   683.740297ms  683.740297ms  683.740297ms          0        0       0      1  index.html
   576.100572ms  576.100572ms  576.100572ms          0        0       0      1  _default/index.json
   544.220254ms  181.406751ms  251.412881ms          0        0       0      3  section/drugs.html
   531.282174ms  531.282174ms  531.282174ms          0        0       0      1  _internal/_default/sitemap.xml
   506.155703ms       9.823µs   17.361366ms         34        3    1710  51524  partials/breadcrumbs.html
   177.543467ms  177.543467ms  177.543467ms        100        0       0      1  partials/sections_list.html
   171.527494ms  171.527494ms  171.527494ms        100        0       0      1  partials/recent_blog.html
   163.866247ms     280.113µs    4.071835ms         71       66     384    585  partials/recent.base.html
   156.205127ms  156.205127ms  156.205127ms        100        0       0      1  partials/recent_drugs.html
   121.006815ms   30.251703ms   89.772031ms          0        0       0      4  section/blog.html
    95.564079ms   10.618231ms   15.336816ms          0        0       0      9  section/psychedelics.html
    89.108602ms   89.108602ms   89.108602ms        100        0       0      1  partials/recent_faq.html
    81.245188ms       30.95µs   18.352802ms        100      100    2621   2625  partials/drugs.html
    75.100218ms   75.100218ms   75.100218ms        100        0       0      1  partials/recent_psychedelics.html
    72.433562ms      33.364µs   24.151059ms         99      100    2164   2171  partials/widgets/recent.html
    61.108826ms      28.225µs   14.900721ms        100      100    2164   2165  partials/widgets/search.html
    57.314276ms      26.375µs    3.542685ms        100      100    2164   2173  partials/widgets/tags.html
    47.832594ms      22.012µs    8.011068ms        100      100    2164   2173  partials/widgets/categories.html
    42.986633ms      94.476µs     569.119µs          0        0       0    455  shortcodes/notice.html
    26.497385ms     1.89267ms   12.764928ms         18        0       0     14  partials/seo/schema.html
    26.043028ms    1.860216ms   12.668114ms        100        0       0     14  partials/seo/google_analytics.html
    25.091106ms      11.589µs    4.721458ms        100      100    2164   2165  partials/widgets/facebook.html
    21.221506ms         438ns     138.906µs          0        0       0  48393  shortcodes/amazon_native.html
    20.597361ms       73.04µs    1.029047ms          0        0       0    282  shortcodes/imgcap.html
    17.036583ms     681.463µs   12.917209ms          0        0       0     25  shortcodes/blockquote.html
    16.955402ms      40.466µs    6.246634ms          0        0       0    419  _internal/alias.html
     1.703648ms    1.703648ms    1.703648ms          0        0       0      1  404.html
      1.02474ms     1.02474ms     1.02474ms        100        0       0      1  partials/widgets/random.html
      504.685µs     504.685µs     504.685µs        100        0       0      1  partials/features.html
      487.553µs     487.553µs     487.553µs        100        0       0      1  partials/widgets/faq.html
      469.187µs     469.187µs     469.187µs        100        0       0      1  partials/testimonials.html
      421.316µs     105.329µs     356.953µs          0        0       0      4  shortcodes/radio.html
      387.882µs     387.882µs     387.882µs        100        0       0      1  partials/team.html
      370.528µs     370.528µs     370.528µs        100        0       0      1  partials/clients.html
      236.372µs     236.372µs     236.372µs        100        0       0      1  partials/signup.html
      234.148µs     234.148µs     234.148µs          0        0       0      1  _internal/shortcodes/youtube.html
      198.309µs     198.309µs     198.309µs          0        0       0      1  shortcodes/calc.html
      193.539µs      10.752µs      120.63µs        100        0       0     18  partials/site_social.html
      176.918µs     176.918µs     176.918µs        100        0       0      1  partials/hero.html
      174.153µs     174.153µs     174.153µs        100        0       0      1  partials/welcome.html
      117.194µs     117.194µs     117.194µs          0        0       0      1  shortcodes/aboutimg.html
        3.336µs       3.336µs       3.336µs          0        0       0      1  robots.txt


                   |  EN    
-------------------+--------
  Pages            | 49286  
  Paginator pages  |  2438  
  Non-page files   |    28  
  Static files     |  2451  
  Processed images |     0  
  Aliases          |   419  
  Sitemaps         |     1  
  Cleaned          |     0  

Total in 137144 ms
stoned@stoned-desktop - ~/websites/websites/psychedelicsdaily.com-hugo - [master !x?+]-
$  hugo -b="https://www.psychedelicsdaily.com/" --minify
Start building sites … 
hugo v0.110.0-e32a493b7826d02763c3b79623952e625402b168+extended linux/amd64 BuildDate=2023-01-17T12:16:09Z VendorInfo=snap:0.110.0

                   |  EN    
-------------------+--------
  Pages            | 49286  
  Paginator pages  |  2438  
  Non-page files   |    28  
  Static files     |  2451  
  Processed images |     0  
  Aliases          |   419  
  Sitemaps         |     1  
  Cleaned          |     0  

Total in 111252 ms

I’m still continuing to optimize and I might build a seperate hugo news website at a news. subdomain instead or make an nginx location for /news/ and serve the news site there, so that it doesn’t have to be processed by the normal site code. The new news site can be simple without the complicated layouts of the main site…

For now 2 min is acceptable.

Thank you to everyone for your support.

Could you kindly explain further on this? Why 1.2 million ‘pages’ and not 50k? Does first or last increase the build? I am worried if I should remove them in my site.

Not pages. 1.2 million iterations.