I’ve built a couple of websites with Hugo, and I wanted to learn Go, so I went looking for open issues. One of the ones that stood out to me was about not copying static files unnecessarily when building a site.
It seems like there are two independent problems here.
The first is that when testing that your site is functionally-correct (e.g. in CI) you don’t need to actually get as far as building a full public/ folder with all static files in it (I think this is what snicolai-blog was talking about). It’s possible that something like a --skip-static-files option would be enough here; something similar to a --dry-run option you might get with other tools, except that it would obviously still generate .html files etc.
The second is that if you’re going to build your site into a scratch folder and then e.g. rsync the resulting contents to a production server, you don’t want to copy gigabytes of files from one folder to another unnecessarily (I think this is Berny23’s position). Hence why we should try to hard-link the originals to where they end up in the public/ directory; or, possibly, symbolically-link them if the platform we’re on doesn’t support hard links. So maybe you’d say --link-static-files, --link-static-copies, --link-published-static-files or something.
But beyond that, it feels like that we have two different piles of stuff when we build Hugo sites. On one hand (1) we take a whole bunch of .md and e.g. .toml files, and apply all sorts of funky logic and templates that ultimately generate .html files, and that’s ultimately what they’re for. They serve no purpose after that, and there’s no reason why they’d matter for the generated website. They feel like source code.
But we also (2) have a bunch of .gif, .png, .jpg, .mov etc. files hanging around that ultimately belong in the generated website. Sure, we can peek at their filenames as part of #1, even peek at their contents so we can e.g. generate thumbnails, or maybe minify other static stuff like Javascript or CSS, but that’s just a consequence of Hugo being clever. There’s no requirement that you should have to do anything like this. The static files are other files that go into the website, and arguably that’s where they belong. We’re just borrowing them from the future.
So while the basic model is that you have a directory structure like this (ignoring assets because I think they follow the same model as content/: they’re supposed to be things that Hugo chews on):
hugo/
content/
static/
public/
.gitignore # These files are all derived and less important
I wonder whether it would sometimes make sense to invert this slightly and have something like this instead?
website/
.gitignore # Only the .html files etc.
images/
hugo/
content/
static/
images -> ../../website/images
public -> ../website
The file-syncing code that Hugo uses currently isn’t aware of symbolic or hard links, unfortunately; I’ve got a patch for that which I’ve kept in draft to avoid bothering people before I posted this message. But I think that once implemented it should speed up re-“generating” sites where most of the static content is already available and doesn’t need copied again.
Beyond that, I wonder if there’s a conversation worth having about potentially having four types of directory (which is obviously a lot more complicated than the current setup):
static-source/
images/
dynamic-source/
content/
static/
images -> ../../static-source/images/
public -> ../generated-content
generated-content/
index.html # and friends; from Hugo
images/
bigimage_thumbnail.jpg # generated by Hugo
website/
index.html -> ../generated-content/index.html
images/
bigimage.jpg -> ../static-source/images/bigimage.jpg
bigimage_thumbnail.jpg -> ../../generated-content/images/bigimage_thumbnail.jpg
I’ve used symbolic links here because they make it clear where files are coming from, but you could use hard links on file systems that supported them.
The benefit of having a two-stage (under the hood) process where you (a) first generate files from Hugo and then (b) combine them with static source files, is that you can blow away the generated-content and website directories without worrying, because everything will be rebuilt when you run hugo build. This doesn’t work if your website/images directory is both the place where large images live, but also where Hugo writes thumbnails to.
And for each of the four logical component directories, you can decide whether to check them into version control or not, and/or do funky things with S3 or Cloudflare or what have you.
From preliminary reading it feels like this might be something like mounts, but for outputs rather than inputs?