Hugo Local Server out of memory issue for 50000 pages

Hi I have a site that generates 50 thousand pages and when I do a ‘hugo server’ the process takes 5 minutes and then gives an out of memory error.

When I do a ‘hugo’ command to generate the public folder, it generates 5Gb of content and netlify just wont process that much data.

Actual data is a csv with US States and US cities. The csv in itself is less than 3mb.

56 states x no of cities in each state = 45000(roughly)

My client says the number of pages can go upto half a million pages when each city will have 5 pages instead of one.

Is Hugo or generally a SSG the right tool for this. Of course once the total pages are generated somehow using a High End machine (i guess not tried for such large number of files) it may run fast than any cms / framework can dream of.

A cms or any server side framework can do this in roughly 3 pages (plus the overhead of cms / framwork files).

What would you guys suggest. Please Help.

System Configuration:
i7 256gb ssd 16gb ram
Note each generated html file in public folder may be 15 kb to 150 plus kb

People would need to see the actual site (repo) to give better feedback, or, a demo site that recreates the problem, but, ideas:

  • by default hugo server renders to memory. Try --renderToDisk?
  • re switches, did you try --verbose to see if you can see any relevant errors, or --memstats or --templateMetrics?
  • regarding hugo generating 5GB, is there anything unneeded that you can switch off in the site config, that will help with that.
2 Likes

Thanks for taking the time. Here is the github link. But its not complete by anymeans.
I have deleted the contents of the public folder to save bandwith and space.

The stack trace of the hugo server command is here

The idea is to generate a url of this format
www.domain.com
www.domain.com/usstate/
www.domain.com/usstate/cityname/

I am of the understanding that for this to be possible, the content folder should have
usstate names as folders
and us city names in each state either as a cityname.html or it should be under a folder named with the city name and an _index.html within the same.

Is there a way i can avoid creating the vast amount of folders and files and still have the url structure as I mentioned.

Even a tiny light you can show is appreciated greatly.

I don’t have much time today but, I cloned your repo and noticed:

  • your index page’s content file had frontmatter including the type of “statecity”. The site was consequently trying to render the home page itself using the “list” template, with a massive html table of everything. I saw that if I remove that “type” declaration from the index’s content file, it will use the expected layout/index.html instead of that list template. I also switched the index content file from _index.html to _index.md, because that is what I am used to. Not sure what the implication of having all the content files with a html extension rather than md. Once I make that change, the site will render to public with --renderToDisk specified, but, I do get a “too many files” error at the very end, and it turns out hugo server won’t start. But, at least it appears to create all the files this way.
  • not sure why but the site is rendering with some cities at the root level, instead of under the state folders…
  • looking at /path/to/dev/statecitysite/public/ny/adams/index.html as an example, there is something wrong with the way the data is being pulled, because I am seeing AR related data for a NY city.

Thanks again for trying out.

Regarding the AR data, thats because I hard coded AR data .csv into the single page (ideally that would be a loop of the corresponding state name.csv)

renderToDisk does work after I tried for 5 minutes of compiling.

Some cities populated in root level because of issue in data file where there were no state name entered ( i pull from csv using python)

Rick, would you handle this in hugo given the overhead in compiling and generating the public folder (not to mention every time we make a change to the site, it will recompile not in our machine but by a host like netlify or amazon)

This is a proof of concept for choosing a technology like hugo. I am sure jekyll would be a lot slower. So I am willing to take your advice on this. Hugo or Python-Djnago or Wordpress or anything from your experience that you feel will be better for handling huge data.

NOTE: I understand its your sunday and I am very much pleased to see a community of people like this. Thanks for your assistance again.

I hope one of the devs will look at your problem. Hugo is performant, but it is also sensitive to how you go about things so, there might be a more efficient way to process those csv’s.

Hi all

Just trying to bring to your attention the problem again.

Can this or should this be done in hugo

Yes. :slight_smile:

Hi Maiki

Thats nice to hear :wink:

Generating 5 Lakh pages seems an over kill.

In angular and other front end frameworks we have the option to change the url (routes) without having a physical file / folder

If hugo can do that with data folder that would save the content folder and public folder lots and lots of files.

Of course I can do by importing angular / react into the hugo project. Just felt that this can be an option provided by hugo.

This would greatly decrease the sites size and compile time, not to mention upload and download bandwidth when developer changes site layout (forcing us to recompile everything)

If only the site compiles faster I am willing to go the Hugo route. But taking 5 minutes / Memory error while compiling or running local server may not be an optimal solution…

Please advice…your thoughts and suggestions are welcome.

I wonder how smashing magazine does it (i think they use hugo)

Building 50,000 pages is easy. If I remove this part: {{ range $i, $r := getCSV $sep $url }} I get the following:

                   |  EN    
+------------------+-------+
  Pages            | 43401  
  Paginator pages  |     0  
  Non-page files   |     0  
  Static files     |     0  
  Processed images |     0  
  Aliases          |     0  
  Sitemaps         |     1  
  Cleaned          |     0  

Total in 7999 ms

It is the getCSV part that uses so much resources.

I would suggest using one of the many javascript libraries that parses CSV instead.

I would also recommend moving your CSV files into /static instead of /data, which is supposed to contain TOML JSON and YAML files, not CSV: https://gohugo.io/templates/data-templates/#the-data-folder

4 Likes