Generating a rather large website

I work for a news organization and we were considering hugo. I wanted to test the theoretical limits of hugo to make sure it can handle our current article count.

I created a quick script that generated 10000 pages over a couple of years per month

/2019/01/files.md
/2019/02/files.md
/2019/03/files.md
/2019/04/files.md
etc…

After generating 2 million in under 10 seconds with my script. I went on to hugo to see if it compiled. Anyone have any experience generating this large of a site with hugo. I was unable to perform a build as it froze my setup for hours.

My setup:
Dual Xeon E5-2697
128GB of ram

Typically, one would start small, then ramp up gathering metrics as you progress.

Does this help?

Building sites …
                   |  EN  |  PT
+------------------+------+------+
  Pages            | 3932 | 4459
  Paginator pages  |  326 |  419
  Non-page files   | 4028 | 1749
  Static files     | 2460 | 2460
  Processed images | 7653 | 5161
  Aliases          |  868 | 1134
  Sitemaps         |    2 |    1
  Cleaned          |    0 |    0

Total in 155920 ms

Build time is all about theme complexity, and many of the available themes make assumptions that scale poorly (like including every article in sidebars and menus on every page, and not using partialCached).

What theme did you use in your tests?

-j

1 Like

Yes that does, thats about 18ms a page since you have two languages. So it really could take like 10 mins with 2M pages. Im really not sure what happened with my setup. Was that a build from scratch or did you rebuild an existing folder?

I used beautifulhugo ported from jekyl as a base. it looked close to how we would style a site. with way less features of course.

Yes but my news organization has 1.6 Million articles in wordpress atm. so i went up to 2 million to future proof test.

I run nightly rebuilds on a digital ocean droplet (2Gb of RAM) and my theme contains a lot of customisations. To name a few:

  • Section specific templates
  • Shortcodes to render page links with associated thumbnails
  • Header photo for homepage, single pages and posts with auto resize and crop
  • Content organised by “Stories”
  • Instagram section for photos
  • Option to hide the header
  • Option to have content unlisted
  • Automatic image galleries using Hugo Resources and photoswipe.js
  • Support for Portuguese and English in i18n folder
  • Custom twitter-card tags
  • ld+json format for all pages

There is a slim version of it here: https://github.com/brunoamaral/Hugo-Now-UI

I can run a full rebuild to give you a better benchmark.

1 Like
Aaron@Swartz:~/Digital-Insanity$ hugo --minify

                   |  EN  |  PT
+------------------+------+------+
  Pages            | 3932 | 4459
  Paginator pages  |  326 |  419
  Non-page files   | 4028 | 1749
  Static files     | 2460 | 2460
  Processed images | 7653 | 5161
  Aliases          |  868 | 1134
  Sitemaps         |    2 |    1
  Cleaned          |    0 |    0

Total in 225039 ms

so 33% slower on a full rebuild. ima give ur theme a go with testing its limits. maybe something in my theme just sucks. The company would kill me if i wiped out they’re seo backlink history. Otherwise i would have just done the last few years

A small script to generate an over-simplified 10,000 page site. You can tweak it to simulate a more realistic site.

Script
#!/bin/bash

for i in {1..10000}; do
  date=$(date -v -${i}d +'%Y/%m/%d')
  hugo new ${date}.md 

cat << EOF >> content/${date}.md 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean diam est, rutrum vel tincidunt eu, gravida vitae arcu. Donec quis dignissim libero. Mauris ultricies justo quis risus convallis, et malesuada elit mollis. Donec quis tortor nec massa eleifend gravida quis sed lectus. Praesent auctor leo aliquet, placerat enim a, eleifend nunc. Suspendisse congue massa eget tristique vestibulum. Nulla commodo lacus arcu, sit amet tempor tellus vulputate id. Maecenas non auctor eros, ut tincidunt sapien. Donec luctus, purus tincidunt pharetra viverra, lectus dolor euismod lorem, nec ullamcorper risus sem a tortor. Sed porta aliquet metus quis dapibus. Cras porta risus ut pharetra tincidunt. Nulla facilisi. Vivamus ut ante consequat, hendrerit purus nec, sagittis nisi. Quisque dignissim ullamcorper metus id tincidunt. Suspendisse condimentum nisl in commodo commodo.
EOF

done

hugo -D 
Output

                   |  EN    
+------------------+-------+
  Pages            | 10031  
  Paginator pages  |     0  
  Non-page files   |     0  
  Static files     |     0  
  Processed images |     0  
  Aliases          |     0  
  Sitemaps         |     1  
  Cleaned          |     0  

Total in 4108 ms

The beautifulhugo theme doesn’t use partialCached, so it will definitely have scaling issues; applying it to my 8,500-article blog, for instance, increased the average build time from 15 seconds to 37. It also increased memory usage from 316 MB to 440, so I suggest monitoring RAM usage next time you try to build, and see if you have enough to handle two million articles.

-j

based on my build specs i posted i dont think ram will be an issue. although the partialscache thing i will have to look into. any links?

I didnt need a script, i already wrote one in go. but thanks for posting it for others. Heres my one in golang

package main

import (
	"fmt"
	"io/ioutil"
	"math/rand"
	"os"
	"strconv"
)

func main() {
	dat, _ := ioutil.ReadFile("2015-02-20-test-markdown.md")

	startYear := 2003
	startMonth := 1

	folderPath := strconv.Itoa(startYear) + "/" + strconv.Itoa(startMonth)
	os.Mkdir(strconv.Itoa(startYear), 0777)
	os.Mkdir(folderPath, 0777)
	fmt.Println(folderPath)

	for i := 1; i < 2000000; i++ {
		//If 10k, incriment folder path
		if i%10000 == 0 {
			if startMonth == 12 {
				startMonth = 1
				startYear++

				//New year folder
				os.Mkdir(strconv.Itoa(startYear), 0777)
			} else {
				startMonth++
			}

			folderPath := strconv.Itoa(startYear) + "/" + strconv.Itoa(startMonth)
			os.Mkdir(folderPath, 0777)

			fmt.Println(folderPath)
		}

		ioutil.WriteFile(folderPath+"/"+strconv.Itoa(i)+"-"+RandStringRunes(30)+".md", dat, 0777)
	}
}

var letterRunes = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

func RandStringRunes(n int) string {
	b := make([]rune, n)
	for i := range b {
		b[i] = letterRunes[rand.Intn(len(letterRunes))]
	}
	return string(b)
}
1 Like

Check out the build performance doc.

-j

2 Likes

When I originally ported my blog to Hugo, I wrote a Bash script to generate random Hugo articles using Wikipedia’s Special:Random URL, adding random dates, tags, authors, etc. It gave me a very realistic site to test against.

-j

3 Likes

Hugo is fast, but is currently memory constrained when we’re talking millions of things. I’ve been doing some prototype work behind the scenes to make it scale to those levels. It’s not happening next week, but …

5 Likes

Hi. Hugo build time is quite slow for my website at this moment, and I don’t have millions of articles.

hugo server -D -F --renderToDisk -d "/local/build/path"

Building sites …
                       |  FR
    +------------------+------+
      Pages            |  277
      Paginator pages  |    9
      Non-page files   |  759
      Static files     | 1734
      Processed images |  444
      Aliases          |    3
      Sitemaps         |    1
      Cleaned          |    0

    Total in 129974 ms

As @brunoamaral, I have a custom theme with almost the same list of features (a bit less in fact). I use partialCached function (website repo here). Image processing and static file copy seems to be responsible of most of the build time. The problem is even more important if you have taxonomy enabled and a lot of taxonomy list pages generated (e.g. one list of article for every tag). So If you process images, have a lot of static files in your website, and taxonomy pages, you should include this in your benchmark

It seems that on a mac, the static file copy and the caching of the processed images is more optimized. So the “first build time” drop drastically the second time you run the build command. I run on a windows surface 3, not a lightning bolt, and not a mac, but a mac user built my site twice on his mac and reported a drop from 21 seconds to 2 sec. I talk about this issue on github and on this forum.

@lamyseba I took a look at your site out of curiosity. I feel it’s rendering way too slow for your use case and have a few suspicions about the reason why.

  1. You have a lot of template files that I feel could be trimmed using the baseof.html, it would take a closer look to be sure of this and depends on your requirements for the homepage, list and single views.
  2. You could cut down a lot of file size on your images using imageOptim (see notes below)
  3. I also managed to get minor improvements by using partialCached in some places (again, see notes please).

Hope this helps !

Before optimization, after resources have been cached

                   |  FR
+------------------+------+
  Pages            |  513
  Paginator pages  |   17
  Non-page files   |  758
  Static files     | 1734
  Processed images |  450
  Aliases          |  186
  Sitemaps         |    1
  Cleaned          |    0

Total in 2071 ms

imageOptim

                   |  FR
+------------------+------+
  Pages            |  513
  Paginator pages  |   17
  Non-page files   |  758
  Static files     | 1734
  Processed images |  452
  Aliases          |  186
  Sitemaps         |    1
  Cleaned          |    0

Total in 1706 ms

(Measured after caching resources)

Diff

--- a/layouts/_default/list.html
+++ b/layouts/_default/list.html
@@ -1,5 +1,5 @@
 {{ define "main" }}
-       {{ partial "nav.html" . }}
+       {{ partialCached "nav.html" . }}
                <main class="list-{{ .Section }} pure-g">
                        <div class="list-head pure-u-1">
                                <h1>{{ .Title }}</h1>
@@ -20,5 +20,5 @@
                                {{ partial "pagination.html" . }}
                        </div>
                </main>
-       {{ partial "footer.html" . }}
+       {{ partialCached "footer.html" . }}
 {{ end }}
diff --git a/layouts/_default/single.html b/layouts/_default/single.html
index 6541e88..ef66748 100644
--- a/layouts/_default/single.html
+++ b/layouts/_default/single.html
@@ -1,5 +1,5 @@
 {{ define "main" -}}
-       {{ partial "nav.html" . }}
+       {{ partialCached "nav.html" . }}
   {{ $header := "nil" -}}
   {{ $header_src:= "nil" -}}
   {{ with .Scratch.Get "header-img" -}}
diff --git a/layouts/index.html b/layouts/index.html
index a047c70..16f7c7d 100644
--- a/layouts/index.html
+++ b/layouts/index.html
@@ -2,8 +2,8 @@
 <div class="parallax">
   <div id="top"></div>
   <div class="parallax__layer">
-   {{ partial "nav.html" .}}
-   {{ partial "home/header.html" .}}
+   {{ partialCached "nav.html" .}}
+   {{ partialCached "home/header.html" .}}
   </div>
   <div class="parallax__front">
        <main>
2 Likes

Thanks a lot for this reply @brunoamaral !
You wrote

after resources have been cached

This is the key problem, It seems that on window at least, resources never get cached (except in fast render mode, while the server is running after the first build). But when you stop the server and start it again, all resources are rebuilt, or at least copied once again to the destination, and I never get a first built time of 2 sec when lauching a hugo server command.

Still, I’ll implement with pleasure your suggestions. The only one I didn’t understood is

You have a lot of template file that could be trimmed using the baseof.html

Do you mean I should cut the code in those template and put it in the baseof.html, even if this makes baseof.html a huge file?