Worries about developing news site with thousands of pages and heavy media usage


#1

Hello!

I recently learned that SmashingMagazine uses Hugo in production. I can’t find any information on their build time though. I wonder if Hugo is suitable for big multi-thousand pages website (news-like).

  1. I would like to build a news website for my client that uses Hugo + Netlify + Forestryio as the CMS. Now, it’s going to use a lot of media, at least one image per post. I wonder if considering heavy media usage and the additional step of going through Forestry with all the posts would make the build time very long? I don’t think Smashing Magazine uses any CMS.

What is your experience? Would large media folder be a problem with building time?

  1. What about the size of the images. GitHub allows for 1gb of maximum usage. I could probably use GitLab as it’s 10GB there, but still I don’t know how to keep the size of the images in sane sizes. What are your suggestions? Maybe there’s a maximum file size in Forestry? Or use webpack? Other tool?

#2

They have since upgraded their Hugo, so should be faster …


#3

I wouldn’t rely on Gitlab pages for demanding client work. Their pipelines have issues like every other day. And they are known to make breaking changes to their DNS with insufficient warning.

You should consider some other kind of deployment. Preferably a paid reliable solution.

Other than that there are quite a few posts in the forum about heavy Hugo sites, benchmarks etc. Use the search button.

Hugo can handle big sites well.


#4

I think Smashing Magazine use Cloudinary,com for their images?

Edit: Yes they do. See https://www.smashingmagazine.com/2017/03/a-little-surprise-is-waiting-for-you-here/


#5

A related tip for Hugo 0.32:

  • Supports page relative images and processing via bundles
  • Also introduced full support for symbolic links
  • So, it should be possible to store images outside GitHub

#6

Hey, @sebhewelt It’s difficult to compare directly because much depends on the logic you have on your templates. I’ve got one site that builds (on Netlify) with ~5k pages in about 2-3 minutes* typically. That site has a lot of logic in the templates, but also I host the images with Cloudinary. I’ve got another site, around 3400 pages, in 1 minute, and the images are in the repo. If you have very straightforward templates your build times will be faster, or course. You can use Hugo’s Template Metrics to help tweak them.

Smashing Mag uses Netlify CMS.

Using Forestry will not add anything to your build time.

Using a separate service for hosting the images so they won’t be a factor in your build is a good idea, but that may require an extra step for your editors, which might not be a good trade-off for you. I’ve used Cloudinary and imgix and can recommend them both (in fact, I often use Imgix even if my images are part of my repo).

Hope that helps!

  • quoting the times that Netlify reports as deploy times on the deploy page.

#7

I talked with Forestry.io devs, and they say that the images need to be in the repo. (??)

Don’t know exactly why.


#8

Hi, currently we are working on a newssite and the migration from WP to hugo. 170k Pages About 100k images.

In WP we have this DB-Cache-challenge. And a lot of following problems. We waited until hugo 0.32 because of the bundles. It is the most important thing to have Images and content together. For small blogs its an individual decision. But not for big sites.

We are now testing three things:

  1. Editor for the employees
  2. Access-Rights - Maybe some CMS-Headless solution
  3. Migrationprocess

Because of this massive amount of articles we are not using third-party solutions via git/lab/hub etc.
And we want full control from saving articles to the build process in hugo.
We dont see problems with building time in hugo, at the moment.
And a large media folder should not cause problems - but we have to test it first.

But this is like an adventure. Nobody knows what’s coming next.
hugo is fascinating.
Nice Weekend
Beny


#9

I think I’ll have to read through what exactly does those changes in v0.32 mean. I don’t even know what are “symbolic links” :frowning: . Any good sources on catching up?


#10

That is some impressing numbers. I suspected we had to make the build more stateful to attract those sites.

Would love to hear more (even the “what site”), but understand if this is not open info. Send me an email if you can/want: bjorn.erik.pedersen@gmail.com


#11

Yeah, if you needed to keep them out of the build, you’d want to use a service like Cloudinary or something else where you’d upload them independently and then reference them in your site (that’s the extra step I mentioned).

@Beny - I’m curious why it’s important to have images and content together in a large site?


#12

Thx for your interest.
After the permission - we will contact you via mail.
Beny


#13

@budparr Beny will chime in here as well, but I have some background in the newspaper business (with emphasis on paper), and while it certainly has moved towards “illustration stock images from a library”, it is still lots of news, i.e. one-time news-articles with current photos.

Having text + photos packaged together makes totally sense for me. Really easy mental modal that I suspect also scales pretty well.

And if you then want to reuse this later (follow-up article), just reference to the original.


Best Practices: Assets with Content
#14

Ah, yeah, that makes sense. I was thinking in terms of content strategy, making everything reusable, but I see where the image is specific to that piece. Thanks.


#15

Just to that: If you look at the Norwegian news sites writing about the Prime minister Erna Solberg, they very rarely publish an archived photo. It is almost always from something “she did that day”.


#16

We use archived photos for illustrating interesting or actual police-news. its not always possible to have a photo right after an accident or crime happens.


#17

Well @budparr imagine you have 100k images in one Directory. Nothing is easy to find and reference. In the news business all is speed. Every once in a while you have to search an image for an article.
When the images are all together in one directory - it will drive you crazy.
Even when you use Subdirs under the static folder.
Every image can have a name with sense or just a combination of characters and numbers DCF37773_hhjd–goog.jpg for example.
Its really confusing.
And at the end, we have to realize backups of the whole system. And its best to have everything under content together.
With the new bundle feature it is possible to migrate from our custom WP and the db to hugo. And i am really happy about that.


Best Practices: Assets with Content
#18

Thank you. I started a new thread and posted your response there and referenced Bjørn Erik’s: Best Practices: Assets with Content


#19

Symbolic links in UNIX are the same as folder & file shortcuts in Windows & macOS.

See the test repo for Hugo 0.32


#20

hi @Beny we are also looking at moving our news type site from wordpress to hugo but are struggling to find developer with experience with Netlify/hugo. Do you take clients or new project?
let me know