Workflow for keeping images out of source control

My Hugo site has images, but I don’t want to put those images into source control since they are binary files. Instead I would like to put something more like a map of files to their SHA-1s into S3 (or Dropbox or whatever), and version-control this manifest in git.

Then, when I build the site, I can pull down any changes to the files from remote storage as part of my build steps. (The files themselves are git-ignored in this scheme.) In this way, I:

  • just have to upload files of interest indicated in the manifest
  • keep them out of my repository while still leaving them visible to Hugo
  • keep my repository small

Is this sort of thing a workflow that others use? Is there something that will do some or all of this for me?

I haven’t ever dug into it, but I have often wondered how to best handle a similar need. I had GitHub - theNewDynamic/hugo-module-tnd-imgix: A imgix Hugo Module bookmarked from ages ago.

If you “look past” the manifest part, you could certainly:

  • Have your images living in a directory outside of your Hugo project (and set it up to synch with S3 or dropbox).
  • If you then mount that big images directory into your project’s /assets directory (Hugo has a concept of virtual file mounts)
  • You can then pull in the images you want to use by name/path (using resources.Get etc.)
2 Likes

Have you looked into the Cloudinary free tier? It solved my similar problem. Will obviously depend on how much traffic you get, how many images are involved (and how big their files are), etc.

1 Like

I think the manifest part is important since if images change there’s no way to keep the repository state synchronized with the object store’s state. But I like this approach; maybe it’s a good starting point.

In my head you get synchronization but you miss out on versioning.

Yes, you stated that better – synchronization, just no versioning.

I can add that I have toyed with this idea before, see proposal: Add a "file proxy format" · Issue #5752 · gohugoio/hugo · GitHub

Which I still think is a really good idea that certainly could be expanded to support “folders”. From the Hugo side of it, it’s not hard to implement, but I have silently been hoping that some bigger corp would pick up this ball to make it more “standard”.

1 Like

Yes, I think you’re right – it would be ideal if this were a standard-by-convention (e.g. having an .assets file the way one might have a README, a Makefile, or a .gitignore).

This is something I’ve really been meaning to look at with Hugo, as my primary purpose is for a photography/documentary storytelling/articles/photoblog style feed that replaces instagram, but a lot of image assets. It adds up fast. Ideally hosting elsewhere would be a good out as things grow, I was kind of waiting for Cloudflare R2 to go open beta (it did last week).

But the other reason I chose hugo was for its ability to resize image and metadata support, along with webp.

Yea, well, I’m a pretty active hobby photographer myself, so it’s pretty high on my list.

2 Likes

It is almost certain that I am missing something but isn’t this just as easy as adding entries for images to your .gitignore and then using rsync to sync only the images with your website? I did that for a non-Hugo website and it worked very well. But since a Hugo website is really just a plain website, it should work just as well.

Yea, well, that is a variation of something above. There are 2 issues with this:

  • It would not be a part of Hugo’s build (with its image processing powers)
  • It would not be versioned.
1 Like

That sounds a lot like git-lfs-on-S3.
I have some sites on Netlify Large Media, and that offers on-the-fly resizing. But to be able to use S3 or just any other storage backend would be great.

I found this post detailing how a 7 year old node.js application still worked 2 years ago (although the writer says he had to patch the code here and there). I’m tempted to see if I can get that to run on-demand on Heroku or in the GCP free tier CloudRun, and have git-lfs track the resources dir.

edit: this should be even easier to get to work. Or this.

I’ve tried a few of the git-lfs s3 backend implementations, and they’re all a pain to work with. But that might be because git lfs itself isn’t a great pleasure to work with I guess.
Also, trying to get a 7 year old node app to work while I don’t know much about node proved to be a bit too much. I ended up with Giftless, written in python. Relatively easy to set up and pretty stable. Handles sleep and wake up on Heroku well.
I’ve put an almost completely configured ready to clone fork here. Needs only the bucket name in a config file, the rest is env vars.

I’ll see about Hugo image handling next.