General guidance using Hugo to generate a large amount of content from database/api


I have a database/GraphQL API that is used to manage approximating 200k pieces of content. We need to build a website to display this content. I’m primary a backend developer (very competent using Go), my HTML skills are circa the mid-2000s. I’m not competent using React or any other frontend technologies. The data changes on a daily basis. Hugo seems like a prefect fit to re-generate the website every day and avoid having to develop a dynamic website.

I’m looking for some general guidance on how to approach getting the content from the database/api into a form that Hugo can use. It looks like Hugo has the ability to make remote calls using GetRemote that can make GraphQL queries but it appears this content can only be put on one page, great for navigation bars and summaries.

What would be the recommended way to turn the 200k pieces of content into pages? Is there a Hugo method or module (I haven’t come across any obvious way yet). Or should I write a script that queries the database and generates Markdown which Hugo can then process?

Any other tips or tricks would be greatly appreciated.

Thank you for your time.


Once you read through it a couple of times you’ll realize that it is a really simple approach. Basically you build once to the get data and create content files, then build again using the content files you just created. Something like:

rm -rf prebuild/public && hugo -s prebuild && hugo server

Thank you. Excellent resource

Definitely is going to take a few times to read through. Unfortunately the linked repo doesn’t match the tutorial and I wasn’t immediately able to get things working but I’m very new to Hugo. We’ll figure it out eventually :slight_smile:

This is fantastic start. Very appreciated!

Try this:

git clone --single-branch -b hugo-forum-topic-45433 hugo-forum-topic-45433
cd hugo-forum-topic-45433
rm -rf prebuild/public && hugo -s prebuild && hugo server

Wow. Thank you for putting this repo together!!! This is much clearer. Already see which files I was missing.

Thank you!

Yeah, we needed a simple example to demonstrate the concept. The only part that really requires any thought/fiddling is:

{{/* Map data fields to content and front matter. */}}
{{ $content := .content }}
{{ $frontMatter := dict
  "categories" .categories
  "date" .date
  "description" .description
  "foo" .foo
  "image" .cover
  "title" .title

It all depends on the data source.

@regis came up with this, and frankly, it’s surprising no one else thought of it. I would describe the approach as “dirt simple”.

Also, sooner rather than later, you should test a simple implementation against the actual data set to test performance, memory consumption, etc. That’s a big site.

Yes, I will try to get a prototype running this weekend.

The next big hurdle is actually making the paginated API calls. I’m curious if I need to write my own version of GetRemote. Fun fun fun. :slight_smile:

Your help has been very appreciated!

I forgot a couple of things…

1) I have updated the build instructions above to be:

rm -rf prebuild/public && hugo -s prebuild && hugo server

The rm -rf prebuild/public bit is necessary if items have been removed from the remote data source. Otherwise, the local files generated from the remote content will still be present from earlier runs. In a CI/CD environment (e.g., GitHub Pages, GitLab Pages, Netlify, etc.) that is not necessary, because you will be recreating the public directories (project and prebuild) every run.

2) Hugo caches the result of the remote data query. By default, the cache doesn’t expire. You’ll probably want to change that to something like 12 hours. I’ve updated the test repository referenced above.

Thank you!