Workflow for keeping images out of source control

jxf · May 20, 2022, 2:59am

My Hugo site has images, but I don’t want to put those images into source control since they are binary files. Instead I would like to put something more like a map of files to their SHA-1s into S3 (or Dropbox or whatever), and version-control this manifest in git.

Then, when I build the site, I can pull down any changes to the files from remote storage as part of my build steps. (The files themselves are git-ignored in this scheme.) In this way, I:

just have to upload files of interest indicated in the manifest
keep them out of my repository while still leaving them visible to Hugo
keep my repository small

Is this sort of thing a workflow that others use? Is there something that will do some or all of this for me?

willduncanphoto · May 20, 2022, 6:59am

I haven’t ever dug into it, but I have often wondered how to best handle a similar need. I had GitHub - theNewDynamic/hugo-module-tnd-imgix: A imgix Hugo Module bookmarked from ages ago.

bep · May 20, 2022, 7:42am

If you “look past” the manifest part, you could certainly:

Have your images living in a directory outside of your Hugo project (and set it up to synch with S3 or dropbox).
If you then mount that big images directory into your project’s /assets directory (Hugo has a concept of virtual file mounts)
You can then pull in the images you want to use by name/path (using resources.Get etc.)

bwintx · May 20, 2022, 11:36am

Have you looked into the Cloudinary free tier? It solved my similar problem. Will obviously depend on how much traffic you get, how many images are involved (and how big their files are), etc.

jxf · May 20, 2022, 12:00pm

I think the manifest part is important since if images change there’s no way to keep the repository state synchronized with the object store’s state. But I like this approach; maybe it’s a good starting point.

bep · May 20, 2022, 2:02pm

In my head you get synchronization but you miss out on versioning.

jxf · May 20, 2022, 4:07pm

Yes, you stated that better – synchronization, just no versioning.

bep · May 20, 2022, 4:54pm

I can add that I have toyed with this idea before, see proposal: Add a "file proxy format" · Issue #5752 · gohugoio/hugo · GitHub

Which I still think is a really good idea that certainly could be expanded to support “folders”. From the Hugo side of it, it’s not hard to implement, but I have silently been hoping that some bigger corp would pick up this ball to make it more “standard”.

jxf · May 23, 2022, 3:47pm

Yes, I think you’re right – it would be ideal if this were a standard-by-convention (e.g. having an .assets file the way one might have a README, a Makefile, or a .gitignore).

willduncanphoto · May 24, 2022, 11:24pm

This is something I’ve really been meaning to look at with Hugo, as my primary purpose is for a photography/documentary storytelling/articles/photoblog style feed that replaces instagram, but a lot of image assets. It adds up fast. Ideally hosting elsewhere would be a good out as things grow, I was kind of waiting for Cloudflare R2 to go open beta (it did last week).

But the other reason I chose hugo was for its ability to resize image and metadata support, along with webp.

bep · May 25, 2022, 6:26pm

Yea, well, I’m a pretty active hobby photographer myself, so it’s pretty high on my list.

punkish · May 25, 2022, 6:34pm

It is almost certain that I am missing something but isn’t this just as easy as adding entries for images to your .gitignore and then using rsync to sync only the images with your website? I did that for a non-Hugo website and it worked very well. But since a Hugo website is really just a plain website, it should work just as well.

bep · May 25, 2022, 6:57pm

Yea, well, that is a variation of something above. There are 2 issues with this:

It would not be a part of Hugo’s build (with its image processing powers)
It would not be versioned.

lijn · May 26, 2022, 7:03pm

That sounds a lot like git-lfs-on-S3.
I have some sites on Netlify Large Media, and that offers on-the-fly resizing. But to be able to use S3 or just any other storage backend would be great.

I found this post detailing how a 7 year old node.js application still worked 2 years ago (although the writer says he had to patch the code here and there). I’m tempted to see if I can get that to run on-demand on Heroku or in the GCP free tier CloudRun, and have git-lfs track the resources dir.

edit: this should be even easier to get to work. Or this.

lijn · June 2, 2022, 11:08am

I’ve tried a few of the git-lfs s3 backend implementations, and they’re all a pain to work with. But that might be because git lfs itself isn’t a great pleasure to work with I guess.
Also, trying to get a 7 year old node app to work while I don’t know much about node proved to be a bit too much. I ended up with Giftless, written in python. Relatively easy to set up and pretty stable. Handles sleep and wake up on Heroku well.
I’ve put an almost completely configured ready to clone fork here. Needs only the bucket name in a config file, the rest is env vars.

I’ll see about Hugo image handling next.

Topic		Replies	Views
Issue with image cache in source control support images	4	651	September 22, 2022
What do I commit to git? support git	4	87	November 3, 2024
Managing Hugo blog sites with PDF documents, images, videos support	12	1903	March 9, 2021
Github: Should you separate your Hugo content from the generated html by creating two repos? support	7	889	October 28, 2017
Keep images & content together feature	84	32257	February 14, 2020

Workflow for keeping images out of source control

Related topics