I have about 10K posts in my site. I am using pagination, at 50 items per page; and I am not using taxonomies. My theme is available here for review. The 10K pages of content has not been published yet, but it’s a mix of long-form blog posts and a complete Twitter export of all my tweets converted to Markdown.
When I generate my site on my MacBook, with 16GB of RAM, hugo takes about 6 seconds.
When I generate my site on my server, with 1GB of RAM, hugo takes between 15 and 20 seconds.
I fully understand that the low-memory server will require more time to generate the site. I’m mostly just curious if there are any suggestions for ways to shave off some time? Am I doing anything obviously sub-optimal within my theme files that could be improved?
I’d eventually like to start publishing via a micropub endpoint directly to my server, and have that endpoint invoke Hugo to generate the new content. Waiting ~20 seconds isn’t going to ruin my life, but I am concerned about the scripts timing out eventually. I can raise the max execution time of the scripts; but I’d like to know if there are any other things I can do to improve the hugo experience.
Here’s the relevant bits from hugo --verbose --stepAnalysis=true:
INFO 2018/04/03 06:25:53 Using config file: /home/skippy/hugo/config.yaml
Building sites … WARN 2018/04/03 06:25:53 No translation bundle found for default language "en"
WARN 2018/04/03 06:25:53 Translation func for language en not found, use default.
WARN 2018/04/03 06:25:53 i18n not initialized, check that you have language file (in i18n) that matches the site language or the default language.
WARN 2018/04/03 06:25:53 Unable to find Static Directory: /home/skippy/hugo/themes/skippy.net/static/
INFO 2018/04/03 06:25:53 syncing static files to /home/skippy/hugo/public/
initialize:
32.338486ms (67.345886ms) 5.24 MB 70844 Allocs
load data:
65.84µs (67.963353ms) 0.00 MB 29 Allocs
load i18n:
480ns (68.296762ms) 0.00 MB 0 Allocs
read and convert pages from source:
3.276557237s (3.345290446s) 346.69 MB 2380272 Allocs
INFO 2018/04/03 06:25:56 found taxonomies: map[string]string{"tag":"tags", "category":"categories"}
build Site meta:
31.669489ms (3.377596319s) 2.23 MB 59462 Allocs
prepare pages:
1.294691499s (4.716473319s) 73.92 MB 1110156 Allocs
render and write aliases:
424.344µs (4.71766152s) 0.00 MB 9 Allocs
INFO 2018/04/03 06:26:00 Alias "/note/2007/10/18/archive/1/index.html" translated to "note/2007/10/18/archive/1/index.html"
INFO 2018/04/03 06:26:00 Alias "/note/2007/10/21/archive/1/index.html" translated to "note/2007/10/21/archive/1/index.html"
INFO 2018/04/03 06:26:00 Alias "/note/2007/10/22/archive/1/index.html" translated to "note/2007/10/22/archive/1/index.html"
....
.... a total of 5782 of these INFO lines ....
....
render and write pages:
10.498777746s (16.212976468s) 236.28 MB 5811240 Allocs
render and write Sitemap:
570.789543ms (16.784701644s) 34.83 MB 766640 Allocs
render and write robots.txt:
15.165µs (16.785243846s) 0.00 MB 9 Allocs
WARN 2018/04/03 06:35:10 [en] Unable to locate layout for "404": [404.html theme/404.html]
render and write 404:
6.782589ms (16.792280159s) 0.00 MB 31 Allocs
render and write pages:
1.988066285s (18.780614105s) 83.22 MB 1929902 Allocs
| EN
+------------------+-------+
Pages | 15248
Paginator pages | 190
Non-page files | 0
Static files | 21
Processed images | 0
Aliases | 2868
Sitemaps | 1
Cleaned | 0
Total in 18778 ms
It seems like the generation of all of the aliases is where most of the time is spent. In my site, I don’t need or want aliases for content in the /note/ section. Is there a way I can suppress that entirely for that section?
So looks like 1/2 the time goes in creating the 8k note pages. I believe those are the exported tweets? Suggestion (not an elegant one): May be pack 10 tweets in one page instead of 1 tweet per page? That should bring down the total time by a factor of roughly half (11/20) … It looks like the file IO, time to access that note single layout is the bottleneck. So my suggestion is to just reduce the access to the note single i.e. pack more tweets per page (using your Python script).
Also, you can also review if you need those many RSS pages… You can finely control that, per Page Kind.
Looks like there is a server confusion here. OP is talking about running hugo on a physical web server. You are talking about the hugo server flag.
@skippyhugo anyways renders files to disk, hugo server does not (by default), and thus that switch exists only for the latter. And hugo server is only for development.
“section” is one of the Page Kinds. By default RSS is generated for all section pages (like /posts/, /notes/). You might need to use Custom Output Formats and strategically put the templates in your theme to enable/disable RSS for specific sections. i.e. You’d disable the default RSS output format completely, and enable your custom RSS2 (or any name you pick) output format.
… then set the templates only for the layouts you want. See:
I could convert them all to a kind with front matter, I suppose…
No, Kinds cannot be overridden, but Layouts can be.
If I want to use Micropub to upload images (or other media) for inclusion in a post, those images should be made immediately available to the Micropub client. That is to say, the media upload process should put them into the live site, not the source of my site. This is actually relatively easy to accomplish; and indeed I’d likely put the uploaded media into both the live site and the source’s static directory.
I’ve revised my theme a bit, making use of this forum post to disable RSS for sections. This is what @kaushalmodi was suggesting above, but this forum post cleared up much of my confusion.
Interestingly, even though Hugo is now generating several thousand fewer items, it takes about the same amount of time: ~4 seconds on my laptop, and ~13 seconds on my server.
I’m happy enough with these changes, even though they don’t really improve the speed. 13 seconds isn’t terrible.
How do the CPUs compare between the laptop and server? It’s hard to tell if your CPU-bound or I/O-bound. How fast does it build if you use --renderToMemory (which removes disk write I/O from the equation)? I don’t think memory is the issue. Hugo doesn’t optimize for low-memory scenarios–it just crashes if you don’t have enough to build the site.
In case it’s not obvious, your laptop builds 18,000+ pages at 200µs per page. The server is building it at just under 1ms per page. That’s not horrible.
Also, when using the --templateMetrics option, I recommend also adding --templateMetricsHints. Feel free to share the output with the hints.
The server is a single CPU droplet at DigitalOcean:
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz
Stepping: 4
CPU MHz: 2399.998
BogoMIPS: 4799.99
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust smep erms xsaveopt
My laptop is a quad core i7:
Model Name: MacBook Pro
Model Identifier: MacBookPro11,3
Processor Name: Intel Core i7
Processor Speed: 2.6 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB
My entire site – raw content + generated results + all static files – is 167MB, so the entirety of the site can easily live in memory.
You’re absolutely right: this is a CPU bound process.
In the end, have you finally worked out your micropub integration?
How about using a completely separate media endpoint and then import remote images from your build script? That’s sort of my approach at the moment to integrate remote content into my hugo website, and it plays out nice.
This way you can handle uploads outside your hugo repo, return HTTP 202 with a link to the future post URL, then start your build script to import the image and actually generate the article with the image imported in your hugo repo.
I ended up writing my own micropub server. It supports a media endpoint as well as direct file submissions. What I’ve learned is that not all Micropub clients support media endpoints (yet), so I need to deal with form-encoded media uploads alongside the content.
I don’t currently send an HTTP 202 response because micropub.rocks, the test suite I used, doesn’t support that yet. There is an open issue to add support for HTTP 202 to micropub.rocks, though, so hopefully this will get improved.