Transition 2M posts from WordPress to Hugo?

We are looking to reqrite a website from WordPress to a static site generator and Hugo looks nice. We have 2 million blog posts which is 1TB of HTML data which gives us high infrastructure costs. We also publish up to 500 new blog posts daily. Is Hugo a fit for this? Your thoughts on this is appreciated.

Hugo generates sites extremely quickly so that would be an advantage for you.

What scenario requires 21 posts an hour?!

Thank you for your reply. We aggregate and disseminate lengthy financial disclosure documents which are created daily in the thousands (excluding weekends & holidays). Is Hugo designed to where we would have to fire off a CI / CD process for every new post?

Interesting.

No, you can script it how you need to. Hugo simply does the building based on the markdown files you put in /content in your project.

You can schedule running the build:

 /path/to/hugo -s /path/to/project -d /path/to/builddir

… which just generates the site files and puts them in /path/to/builddir.

Then you could schedule a git commit and git push to whatever branch you are triggering your CI/CD from, as often as you want to publish.

2 million blog posts would be stretching it. We (I) have plans in this area (partial builds, partitioned builds), but that is still just on the drawing bord. If you really want this now, I would say that it should maybe be possible to do a build of the 2 million pages as a separate job, and let the 500 a day be the “online edition”.

1 Like

Thanks for the feedback. We will need to explore these granular CI / CD options and test against our volume.

The other big part of us transitioning to a platform like Hugo is monetizing our content, which is a separate discussion here - Creating a members only site with tiered access?

@hbcondo does your WP site use hyperdb? Are the 2M posts the same post_type ?

Part of our build process is to create a hash for all files in our public folder, compare it to a list of hashes of production files and then create an artifact of changed files to deploy.

Hugo still builds the whole site but we’re only sending changed files.

You may be able to build your site using multiple instances of hugo. Chunking your daily data to separate build folders run multiple instances of hugo then merge public folder.

Bep partial builds, partitioned builds sounds really really useful we’d love to have those features.

Yes, and also very hard to get right.

We are not using hyperdb (yet) and the 2M posts consist of 3 post types

Really interesting project!

I’m one of the founders of Netlify and would love to see how we can help in this. It sounds like a fascinating project.

We’ve helped companies like Smashing Magazine build out really large Hugo based sites and have built in support for identity and role based rewrite rules at the CDN level (Smashing Magazine uses this for their members only tiered pages).

Shoot me a mail at matt@netlify.com if you’re up for a chat.

4 Likes

@biilmann, thank you for your reply. I just sent you an email with details on our project.

@hbcondo What about blog comments? Are there any comments, and any thoughts about migrating them to somewhere?

Given the lengthy content we display, an annotation system like Hypothes.is is what we are looking to transition to