Hugo Chokes On 125 000 Pages


#1

This is a follow to my issue reported on GitHub

I have 124554 pages in Hugo, and when I have 100 everything works just fine but when I try to generate the whole package it seems to choke on them showing a completed message but also not actually generating the pages (just the pagination and indexes for them oddly) and also even the result message shows only 1 author and no tags or series (both of which do exist, and work fine with the 100 or so pages).

Traviss-MacBook-Pro:Demo teamcoltra$ hugo
Started building sites ...
Built site for language en:
0 draft content
0 future content
0 expired content
124554 regular pages created
12 other pages created
0 non-page files copied
9967 paginator pages created
1 author created
0 tags created
0 series created
total in 4745107 ms

if I narrow the number down to 100 pages the site generates just perfectly… 124554 is the correct number of files, but ~70% of those have unique authors, tags, and maybe 50% have series (which, again, all this is working at small scale already so I know they work).

It’s not kicking back errors or anything like that so it’s hard to know exactly what is wrong. I realize I am really pushing the boundaries of a normal use case for hugo but I feel like it should still fail gracefully (and there shouldn’t really be anything inherently holding it back, if it takes 3+ hours to generate… thats fine by me).

I really need/want this to work and don’t even know where to start on the troubleshooting. I would really love some help guys.


#2

What do you get when you run hugo --verbose? Also, what version of hugo?

Is there maybe a missing single template in the theme? My large-scale tests topped out at around 25,000 content pages, but the total number of files generated in public exceeded half a million due to taxonomies and pagination, and it handled that without errors.

-j


#3

Running in -v and seeing what we get.

Traviss-MacBook-Pro:~ teamcoltra$ hugo version
Hugo Static Site Generator v0.25.1 darwin/amd64 BuildDate: 2017-08-02T17:40:53-07:00

#4

So I started this 17 minutes ago after I posted my last post…

Traviss-MacBook-Pro:Demo teamcoltra$ hugo --verbose
INFO 2017/08/10 15:27:05 Using config file: 
INFO 2017/08/10 15:27:05 using a UnionFS for static directory comprised of:
INFO 2017/08/10 15:27:05 Base: /Users/teamcoltra/Documents/Websites/Demo/themes/classic/static
INFO 2017/08/10 15:27:05 Overlay: /Users/teamcoltra/Documents/Websites/Demo/static/
INFO 2017/08/10 15:27:05 syncing static files to /Users/teamcoltra/Documents/Websites/Demo/public/
WARN 2017/08/10 15:27:05 No translation bundle found for default language "en"
WARN 2017/08/10 15:27:05 Translation func for language en not found, use default.
WARN 2017/08/10 15:27:05 i18n not initialized, check that you have language file (in i18n) that matches the site language or the default language.
Started building sites ...
INFO 2017/08/10 15:27:13 found taxonomies: map[string]string{"series":"series", "author":"author", "tags":"tags"}

It isn’t giving me any details about work it’s doing… I’m not even sure if it’s actually doing anything (though it does show 100% use on CPU so hopefully that means it is).

Don’t mean to spam, just giving incremental updates as information is available.


#5

I had similar issues building sites with a large number of files (~30000). I found a blog post about it being a macOS specific issue, and suggested updating the following values to fix it:

$ sudo sysctl -w kern.maxfiles=65536
$ sudo sysctl -w kern.maxfilesperproc=65536
$ ulimit -n 65536 65536

This worked for me, but you might want to increase the limit to 125000.


#6

Verbose didn’t give me a lot of useful information :smirk:

Traviss-MacBook-Pro:Demo teamcoltra$ hugo --verbose
INFO 2017/08/10 15:27:05 Using config file: 
INFO 2017/08/10 15:27:05 using a UnionFS for static directory comprised of:
INFO 2017/08/10 15:27:05 Base: /Users/teamcoltra/Documents/Websites/Demo/themes/classic/static
INFO 2017/08/10 15:27:05 Overlay: /Users/teamcoltra/Documents/Websites/Demo/static/
INFO 2017/08/10 15:27:05 syncing static files to /Users/teamcoltra/Documents/Websites/Demo/public/
WARN 2017/08/10 15:27:05 No translation bundle found for default language "en"
WARN 2017/08/10 15:27:05 Translation func for language en not found, use default.
WARN 2017/08/10 15:27:05 i18n not initialized, check that you have language file (in i18n) that matches the site language or the default language.
Started building sites ...
INFO 2017/08/10 15:27:13 found taxonomies: map[string]string{"series":"series", "author":"author", "tags":"tags"}
INFO 2017/08/10 16:10:22 Alias "/author/travis-mccrea/page/1/index.html" translated to "author/travis-mccrea/page/1/index.html"
INFO 2017/08/10 16:10:23 Alias "/page/1/index.html" translated to "page/1/index.html"
INFO 2017/08/10 16:10:23 Alias "/books/page/1/index.html" translated to "books/page/1/index.html"
Built site for language en:
0 draft content
0 future content
0 expired content
124554 regular pages created
12 other pages created
0 non-page files copied
9967 paginator pages created
0 series created
1 author created
0 tags created
total in 4596564 ms

#7

Next time you run it, open up Activity Monitor and track the CPU, memory, and disk use. In my large-scale tests, it quickly grew to over 6GB of active memory, and yours has around five times the number of content pages.

If I get a chance over the weekend, I’ll rebuild my test data to your size and see if I can reproduce the problem. It looks like you’re using the Hugo Classic theme, correct?

-j


#8

Yep, I tried it with the default classic theme and also my modified theme (that works with a small sample) I have tested it all with smaller article sizes and it’s great. I am running another test with cleaner frontmatter.

I am going to try running it again on a Linux box.


#9

Okay, preliminary testing at scale with a lean theme and simple content (Mac OS 10.11.6, Hugo v0.26, 2.8 GHz Core i5, 16 GB RAM, SSD):

As published, the site has about 18,000 content pages. I took the largest section (~13,000 recipes) and duplicated it 10 times, raising the count to over 150,000 content pages.

Normal:

% rm -rf public; time hugo
Started building sites ...
Built site for language en:
0 draft content
0 future content
0 expired content
17781 regular pages created
409 other pages created
0 non-page files copied
2773 paginator pages created
359 categories created
total in 24286 ms

real	0m24.808s
user	0m54.095s
sys	0m10.862s
% find public -type f | wc -l
   21011

Scaled up:

% rm -rf public; time hugo
Started building sites ...
Built site for language en:
0 draft content
0 future content
0 expired content
153071 regular pages created
579 other pages created
0 non-page files copied
23882 paginator pages created
359 categories created
total in 468720 ms

real	7m49.740s
user	22m33.384s
sys	2m2.473s
% find public -type f | wc -l
  177580

8.6 times as many content pages took 19 times as long to build, which isn’t really surprising, since it was using 6.53 GB of RAM, and copying the exact same content into multiple sections significantly increased the size of the taxonomies (and large taxonomies are slow). A spot check suggests all the pages were built correctly.

-j


#10

Another thing, is Hugo supposed to empty the files of the content that it converts? After I run “hugo” all the files in the content directory become 0kb large (totally emptied).


#11

Hugo doesn’t write to the content directory at all.

-j


#12

:\ This happens every time, now on two different systems where when it processes them it also wipes the files.

I am thinking it might have something to do with my pages but the validator that I was linked to before said all the pages I have checked were valid.

https://drive.google.com/file/d/0Bw7iejn0IVbnanhwZEVFQmwycms/view?usp=sharing here is a small (2K or so) sample of my posts.

My guess is that it has something to do with the description field… I might just nix that and see how much it helps.


#13

Thanks. I created a new site with default parameters, cloned the hugo-classic theme, and unzipped your sample into section “post”. After deleting some invalid UTF8 in one file, I got it to build, and the output contained:

0 of 329 drafts rendered
0 future content
0 expired content
0 regular pages created
6 other pages created
0 non-page files copied
0 paginator pages created
0 tags created
0 categories created
total in 38 ms

Interesting notes:

  1. It counted 326 “drafts” despite the complete absence of draft: true in your front matter.
  2. It created 0 regular pages.

I started stripping down the front matter, and when I deleted the “published” field, the rendered pages showed up in public correctly:

0 draft content
0 future content
0 expired content
329 regular pages created
858 other pages created
0 non-page files copied
0 paginator pages created
0 categories created
425 tags created
total in 414 ms

So, published is being converted to a boolean with a false value, and this appears to be a no-longer-documented equivalent to draft=true. All the hugo output you’ve posted has “0 draft content”, so this may not be related to your original problem.

I suggest for the next step, you try a fresh sample of 1,000-10,000 content files that do not contain a “published” field in the front matter.

-j


#14

Oh of course! Published is already a variable! I can change mine to “published date”


#15

OH! I FIGURED OUT PROBABLY WHAT IS EFFECTING THIS: ALL MY FILES ARE BLANK!

I assumed that Hugo was responsible for this, but it’s not… the files are blank because my script to fix those UTF8 errors was screwed up and was just emptying the files. I didn’t notice it, and was running it on the pages before launching.

I have a lot of generating and stuff so I will report back in 4 hours, I think this is the issue though.