How to handle very large content directories

Continuing the discussion from Timed out creating content:

First of all… whaaaat?! That is impressive.

Okay, presuming the bulk of that storage is binary assets, I personally don’t want to keep all that in git.

@brunoamaral, would you mind sharing your deployment method? What gets copied over when you update? How long does it take the site to build? What has been your experience tracking so many assets?

And the same for the rest of ya’ll.

I have about 400MB of images from an older site, and even that seems too much for me. How do you handle this?

It’s all in git LFS, photos and a few small videos. Every night there is a cron job that updates the repository and runs Hugo to produce the pages and crop/resize images. In hindsight, I should have a separate way to handle content files.

This was last night’s build:

                   |  EN  |  PT
+------------------+------+------+
  Pages            | 2639 | 1445
  Paginator pages  |  414 |  297
  Non-page files   | 3015 | 1868
  Static files     |  682 |  682
  Processed images | 5604 | 2851
  Aliases          |  457 |   16
  Sitemaps         |    2 |    1
  Cleaned          |    0 |    0

Total in 61140 ms

In my experience, I don’t feel it’s a pain to handle these many assets, Hugo does the heavy lifting and in the past I had a small bash script to optimize and resize with imagemagick.

Every page is a page-bundle, with a folder for a gallery and another for loose images. There is a shortcode that automatically turns it into a masonry-like view with the photoswipe plugin. Like here for example: Notebook

There is an old post about my process here: Under the Hood

That’s about it, but let me know if you want me to share more. :slight_smile:

3 Likes

Thanks!

Please allow me to benefit from your hindsight: how would you have handled it, if you were starting today?

Also, where do you host LFS files? Bucket storage, or local to yer git remote host?

Everything is on github’s servers and cloning the repository is a pain. If I was starting from scratch I would use dropbox or a similar service for the whole content/ directory. Everything else could be the same, with a cron job building the site every night.

Dropbox? I’ve had issues in the past with Hugo projects within the Dropbox directory. Some files will throw errors and they will not sync.

And that was just for backup not production.

Could you elaborate? This would be a one way sync of the content/ folder to still have the code history in git.

I had backed up the entire directory of a Hugo project and then when I wanted to access it from another device not everything would be there. If I remember correctly it was Git related files that were missing and the repo became corrupted.

But what you’re talking about seems different.

If I may ask, how would you go about connecting an external /content/ folder from Dropbox with a Hugo project that has everything else in a GitHub repository?

If I decide to take that route, it would probably be with something like this: https://www.digitalocean.com/community/tutorials/how-to-install-dropbox-client-as-a-service-on-ubuntu-14-04

TLDR: a dropbox command-line client. Hugo now supports symlinks, so it would be easy to link to the content/ folder wherever it would be.

1 Like

Thanks for the write up, it’s great to see how others are handling this.

I have a similar set up to yours, using git LFS on Github. Did you need to take any steps to get image links to work in your generated site? After switching to LFS (got the warning from Github, my repo grew to several GB), my image links are broken.

If anyone is considering Dropbox sync it may be worth checking that their updated list of supported filesystems is compatible with your hosting environment. It appears that they are planning to withdraw support for encrypted Linux filesystems in November.

https://www.theregister.co.uk/2018/08/14/dropbox_encrypted_linux_support/