Debugging my (seemingly?) slow build

Folks, my launch of hugo server is taking a long time. Running:

hugo --templateMetrics --templateMetricsHints

Returns the chart view rather quickly, and the first five template diagnostic rows seem reasonable:

      cache     cumulative       average       maximum
  potential       duration      duration      duration  count  template
      -----     ----------      --------      --------  -----  --------
          0    2.0697858s   48.134553ms    169.7276ms     43  books/single.html
          0    362.7431ms   13.434929ms     54.9344ms     27  research/single.html
          0     149.932ms      37.483ms     59.0012ms      4  books/list.html
        100    148.1439ms    1.130869ms     38.1359ms    131  partials/site-style.html
        100    133.2269ms    1.016999ms     16.5167ms    131  partials/site-navigation.ht
ml

Interestingly, after the output is presented, the process hangs and does nothing for a while (i.e. I have no diagnostic data about what’s happening in this interval) before returning:

                   |  EN
-------------------+-------
  Pages            |  161
  Paginator pages  |    7
  Non-page files   |    0
  Static files     | 5969
  Processed images |    0
  Aliases          |   23
  Sitemaps         |    1
  Cleaned          |    0

Total in 70603 ms

So that’s roughly a 1min 10 seconds build time. That seems unexpectedly large. I’ve tried various experiments to rm content/posts/* to see if the build gets better, but I gained no insights.

I see this same “silent period” of hanging when i run hugo server.

Notable factors

  1. I recently added many small files to a top-level section
  2. I am running this via the Docker image klakegg/hugo:latest since I’m on an older laptop
  3. hugo server displays the same behavior on startup, but after serving begins on port 1313, edits to source files rapidly change the served site

Any tips on how to get more data?

What are those 6K static files?

I was unable to duplicate this delay, even using klakegg/hugo:alpine so that I could factor out the Docker startup time by shelling in and running time hugo --templateMetrics --templateMetricsHints:

...
                   |  EN   
-------------------+-------
  Pages            | 5635  
  Paginator pages  | 1137  
  Non-page files   |    8  
  Static files     |  100  
  Processed images |    0  
  Aliases          | 2848  
  Sitemaps         |    0  
  Cleaned          |    0  

Total in 161473 ms

real	2m41.634s
user	1m37.842s
sys	0m26.932s

There was no delay between the end of the build and the return to the container’s shell prompt. (12-inch MacBook running Catalina, Docker 3.1.0)

Note that this runtime is consistently 6 times longer than running Hugo locally on the same data. A lot of that may simply be the poor performance of Docker volumes on Macs; you didn’t include any information about your environment.

-j

Clever thing to note, @bep! And please let me know of my workflow is somehow creating the technical confusion that’s hurting performance. Here’s how I investigated to get an answer to your question.

I took a sample of what’s in public and did some scratch investigation with ind public -type f |sed 's/\(.*\/\).*/\1/'|sort|uniq -c |sort -n|grep -v big-image-dir:

   3 public/posts/
   5 public/images/2021/03/14/
   6 public/
   6 public/icons/
   6 public/images/2021/02/08/
   7 public/images/2021/02/23/
   7 public/images/2021/02/26/
   8 public/images/2021/02/05/
  15 public/images/2021/03/12/

As you see, it’s not that there are a lot of files in a few directories, there are lots of directories, each with two files. This site hosts my twitter export archive thus, for each tweet, 2 files.

So the main content appears to be "Results from my last hugo run` plus the images and media in order to make the posts for the 2 year’s worth of posts load locally.

Why Are Things This Way?

In my “local” config, I set staticDir = "public". Doing so means that /images/year/month/day/some_image.ext paths resolve inside of my documents’ content. Great.

When I “publish,” I push my git repo to my host and let a post hook build static files and sync them to the web-server-served directory via rsync. I also rsync all my images etc. in ./public/images.

Critically the same path for images etc. in my content works “in production” as well.

I’m starting to wonder:

  • Is my staticDir config forcing hugo to crawl those ~6K files after it generates the new payload?
  • If so, is there a better way to address keeping the asset linking working for both my production and local authoring contexts?

I’ll apologize for verbosity here and hope that thorough motivation and technical background help you diagnose more easily. As ever, thanks.