171.456 docs, 22 taxonomies, 20 minutes


#9
  1. I had paginate=10 for my test site.

  2. The number of pages is huge because I added the random-taxonomy generator to my wikiblog script in a hurry, and for a taxonomy with N terms, it added random(N-1) of them to each article, so many MD files had hundreds of terms in their front matter. And I only generated 1,000 distinct articles, which I then copied 17 times into sub-directories and added to my real blog entries. As a result, despite having a lot less total terms than his site, I had a lot more articles per term, generating a lot of pages.

I suspect his relaterede and skribenter taxonomies have only a few terms per article, but the sheer size of them makes the results comparable to my hacked-up example.

-j


#10

OK, 550000 pages rendered in 23 minutes isn’t bad. There is room for improvement of course, but in my perspective, comparing it the other static site generators, it s pretty darn good.


#11

I’ve done some more structured testing (with default pagination). I created 1,000 random articles with simple tags/categories, and copied them into 20 different sections, for a total of 20,000 content files. Then I wrote a standalone script that would generate X taxonomies each containing Y terms, and insert 1-X random taxonomies into each article, each with 1-5 random terms.

Without the random taxonomies, build time was 268956 ms.

With one 1,000-term taxonomy, it was 290217 ms.

With ten 1,000-term taxonomies, it was 332273 ms.

With one 10,000-term taxonomy, it was 316232 ms.

With four 10,000-term taxonomies, it was 427258 ms.

With four 10,000-term taxonomies and 6-10 terms/taxonomy, it was still only 511538 ms:

0 draft content
0 future content
0 expired content
20000 regular pages created
80088 other pages created
20 non-page files copied
75566 paginator pages created
10000 drove created
9999 pneumonographic created
10 categories created
10 tags created
9999 pleasingness created
9999 smidgen created
total in 511538 ms

So I decided to go for broke, and generated 20 10,000-term taxonomies with 1-5 terms/taxonomy. The build has been running for 50 minutes so far, using only a single core and 4.2 GB of RAM, not spinning up the fans, and has only written out the static files.

If I’m reading the stack trace correctly, only one thread is active, and it’s spending all of its time in assemble().

Update: Final results after 68 minutes:

0 draft content
0 future content
0 expired content
20000 regular pages created
371332 other pages created
0 non-page files copied
207794 paginator pages created
9253 formalesque created
9211 ankyloglossia created
9285 accidie created
9294 cholestanol created
9291 hala created
9280 undisgraced created
9273 brocho created
9270 subsist created
9252 featherless created
9275 turner created
9290 unawfully created
9280 overwalk created
9300 dicker created
9246 electoral created
9302 antalkali created
9296 overdaintily created
9284 tomeful created
9316 extrafloral created
9322 coruscation created
10 categories created
9283 scranny created
10 tags created
total in 4073392 ms

So, 5x the number of taxonomies/terms, 10x the runtime, and most of that was spent in a single-threaded routine that was neither reading from nor writing to the disk.

-j


#12

@jgreely, would you care to share your test generator script?


#13

Sure: taxonomies.pl.

Usage is simple: feed it a bunch of filenames on STDIN, and it will add random taxonomies to their TOML front matter. So, to create 3 taxonomies with 1000 terms each, and then add 1-3 of them with 1-5 randomly-selected terms to each article:

find content -name '*.md' | taxonomies.pl -T 3 -t 1000 -m 5

The thousand content files that I generated with my wikiblog.sh script are here (5MB tarball). I just copied them repeatedly into different sections to increase the article count, and then ran the taxonomy-adder on the results.

-j


#14

Holy shnikes. Hugo is blisteringly fast.

@jgreely I’d be curious to see something more real-world-ish; e.g, a 10k pages, 10 sections, 5 taxonomies, with maybe 50 terms each (this would be a formidable group of metadata to manage). Also, what’s the templating like?

I seem to recall @budparr was saying he was working on a decent-sized site with some complex templating. Maybe he can add some insight into an example in the wild.


#15

Done.

Without random taxonomies:

0 draft content
0 future content
0 expired content
10000 regular pages created
66 other pages created
0 non-page files copied
11128 paginator pages created
10 categories created
10 tags created
total in 81768 ms

Adding 5 taxonomies of 50 terms, 1-5 tax/article with 1-5 terms/tax:

0 draft content
0 future content
0 expired content
10000 regular pages created
576 other pages created
0 non-page files copied
18669 paginator pages created
50 psychological created
50 loudish created
50 bullbaiting created
10 categories created
10 tags created
50 pseudomodest created
50 unerrable created
total in 91408 ms

config.toml:

languageCode = "en-us"
title = "5 random taxonomies"
baseURL = "https://example.com/"
theme = "mt2-theme"

[taxonomies]
category = "categories"
tag = "tags"
pseudomodest = "pseudomodest"
unerrable = "unerrable"
psychological = "psychological"
loudish = "loudish"
bullbaiting = "bullbaiting"

I used the (unpublished as yet) theme for my blog, because it paginates sections and taxonomies. If you have a specific theme you think would work well for testing, I can try it. It’s a bit painful to wade through the gallery looking for features like pagination (for instance, I tried hugo-octopress, but all it generated for taxonomies was RSS feeds, so it only created 100 paginator pages, and finished in 20 seconds).

-j


#16

for the record on Amazon r3.2xlarge 64go de RAM 8cpu

Built site for language en: 
0 draft content 
0 future content 
0 expired content 
220341 regular pages created 
24 other pages created 
0 non-page files copied 
10 paginator pages created 
6 tags created 
0 categories created 
total in 209277 ms

and this is not lorem ipsum pages but real pages (with real content), some pages are build with a FML 300ko json


#17

@jonathanulco that is really cool, and it would be really interesting if you could elaborate a little about what kind of project this is.


#18

I work for a little startup who create a proximity social network, these are the external pages created by users of the service.

P.S : For improve my service, i’m waiting for the incremental build on hugo :wink:


#19

I received the bill from Amazon : $0.67


#20

What’s the bill for?


#21

1 Hugo build with 200K+ pages, see above.


#22

@JLKM @jgreely did you by any change use TOML as page front matter?

If so, see https://github.com/spf13/hugo/issues/3464

And

https://github.com/spf13/hugo/issues/3541

If yes, it would also be interesting if you could repeat the test with YAML.


#23

Also see

https://github.com/spf13/hugo/pull/3545


#24

Yes, for all of my tests. I don’t have time to set up the exact same tests again at the moment, but I do have a work-in-progress site with 50,000+ recipes in 67 sections and 782 categories (0-8 categories per article, with most having 1-2). Using 0.21 on a Mac, here’s the TOML versus YAML comparison.

TOML:

0 draft content
0 future content
0 expired content
56842 regular pages created
851 other pages created
0 non-page files copied
9198 paginator pages created
782 categories created
total in 572439 ms

YAML:

Built site for language en:
0 draft content
0 future content
0 expired content
56842 regular pages created
851 other pages created
0 non-page files copied
9198 paginator pages created
782 categories created
total in 560110 ms

Soon as I have a chance, I’ll replicate the torture test, since this one doesn’t show a huge difference. Oh, and before I forget to mention it, in all of these tests, I’ve been working in a directory that’s excluded from Spotlight indexing, which can seriously interfere with timing.

Amusing side note: when I started building the recipe site (which bulk-converts MasterCook MX2 files from various archives into Hugo content files), I grabbed the Zen theme for a simple, clean look, and watched the first build eat my memory and disk, because the theme embeds links to every content page in the navbar and sidebar. Every file in public was over 5 MB in size, and top reported it using 40GB of compressed pages when I killed it. :slightly_smiling:

-j


#25

@jgreely thanks, looking at your “monster test”, you have lots of different taxonomy terms (not very realistic, maybe?), which I have not tested well – I will add that variant to my benchmarks as well.


#26

I’d have called it completely unrealistic if it weren’t for JKLM’s original site, which has 22 taxonomies ranging from 6 to 8,163 terms (mean 3,763, median 345).

-j


#27

I replicated the big test and kicked it off just before going to bed last night. I took the same 1,000 randomly-generated articles, replicated them into a total of 20 sections, and then used my script to add 20 10,000-term taxonomies. I ended up with a total of four sites, generating the YAML versions with rsync and hugo convert --unsafe toYAML:

  1. TOML, no additional taxonomies: 363945 ms
  2. YAML, no additional taxonomies: 325886 ms
  3. TOML, 20x big taxonomies: 3460447 ms
  4. YAML, 20x big taxonomies: 3548300 ms

Yes, the baseline test took 11% longer with TOML, while the really big test took 2.5% longer with YAML. I suspect that any parsing issues are small compared to the amount of time it spends in the single-threaded assemble() function, which is what the stack traces of my earlier big test showed.

Ideally, I’d run each test 10 times to make sure the timing differences are real, but at the very least, I can say that YAML isn’t obviously faster on a big site with lots of taxonomies.

-j


#28

Yes, I have reproduced it in my benchmarks now – the TOML issue does have a fair effect on “normal sites”, but not what you’re dealing with.