How to handle very large content directories

maiki · June 11, 2018, 7:16pm

Continuing the discussion from Timed out creating content:

First of all… whaaaat?! That is impressive.

Okay, presuming the bulk of that storage is binary assets, I personally don’t want to keep all that in git.

@brunoamaral, would you mind sharing your deployment method? What gets copied over when you update? How long does it take the site to build? What has been your experience tracking so many assets?

And the same for the rest of ya’ll.

I have about 400MB of images from an older site, and even that seems too much for me. How do you handle this?

brunoamaral · June 12, 2018, 1:54pm

It’s all in git LFS, photos and a few small videos. Every night there is a cron job that updates the repository and runs Hugo to produce the pages and crop/resize images. In hindsight, I should have a separate way to handle content files.

This was last night’s build:

                   |  EN  |  PT
+------------------+------+------+
  Pages            | 2639 | 1445
  Paginator pages  |  414 |  297
  Non-page files   | 3015 | 1868
  Static files     |  682 |  682
  Processed images | 5604 | 2851
  Aliases          |  457 |   16
  Sitemaps         |    2 |    1
  Cleaned          |    0 |    0

Total in 61140 ms

In my experience, I don’t feel it’s a pain to handle these many assets, Hugo does the heavy lifting and in the past I had a small bash script to optimize and resize with imagemagick.

Every page is a page-bundle, with a folder for a gallery and another for loose images. There is a shortcode that automatically turns it into a masonry-like view with the photoswipe plugin. Like here for example: Notebook

There is an old post about my process here: Under the Hood

That’s about it, but let me know if you want me to share more.

maiki · June 14, 2018, 2:52am

Thanks!

Please allow me to benefit from your hindsight: how would you have handled it, if you were starting today?

Also, where do you host LFS files? Bucket storage, or local to yer git remote host?

brunoamaral · June 14, 2018, 8:10am

Everything is on github’s servers and cloning the repository is a pain. If I was starting from scratch I would use dropbox or a similar service for the whole content/ directory. Everything else could be the same, with a cron job building the site every night.

alexandros · June 14, 2018, 9:41am

Dropbox? I’ve had issues in the past with Hugo projects within the Dropbox directory. Some files will throw errors and they will not sync.

And that was just for backup not production.

brunoamaral · June 14, 2018, 4:06pm

Could you elaborate? This would be a one way sync of the content/ folder to still have the code history in git.

alexandros · June 14, 2018, 4:45pm

I had backed up the entire directory of a Hugo project and then when I wanted to access it from another device not everything would be there. If I remember correctly it was Git related files that were missing and the repo became corrupted.

But what you’re talking about seems different.

If I may ask, how would you go about connecting an external /content/ folder from Dropbox with a Hugo project that has everything else in a GitHub repository?

brunoamaral · June 14, 2018, 4:51pm

If I decide to take that route, it would probably be with something like this: https://www.digitalocean.com/community/tutorials/how-to-install-dropbox-client-as-a-service-on-ubuntu-14-04

TLDR: a dropbox command-line client. Hugo now supports symlinks, so it would be easy to link to the content/ folder wherever it would be.

Roman · October 2, 2018, 12:35pm

Thanks for the write up, it’s great to see how others are handling this.

I have a similar set up to yours, using git LFS on Github. Did you need to take any steps to get image links to work in your generated site? After switching to LFS (got the warning from Github, my repo grew to several GB), my image links are broken.

andytough · October 2, 2018, 12:55pm

If anyone is considering Dropbox sync it may be worth checking that their updated list of supported filesystems is compatible with your hosting environment. It appears that they are planning to withdraw support for encrypted Linux filesystems in November.

https://www.theregister.co.uk/2018/08/14/dropbox_encrypted_linux_support/

Topic		Replies	Views
Discussion: Content Organization Best Practice support	11	8039	March 4, 2018
Multiple contentdirs support	6	3446	October 15, 2017
File-Based CMS and Hugo as Static Site Generator feature	11	8632	January 13, 2016
Disk space, web site with many files support	13	903	October 29, 2020
File size limitations of GitHub support	6	1450	August 12, 2019

How to handle very large content directories

Related topics