Option to avoid updating mtime

My workflow is to develop a site using hugo server then stage it by running hugo and syncing public with an S3 bucket linked to CloudFront. When everyone agrees everything in staging is good, the staging bucket is synced directly to a production bucket.

AWS s3 sync is based on file size and file modification time only. Thus I would prefer files that are unchanged after a Hugo run do not have their mtime updated. At the moment I am using a Python script that maintains a dictionary of SHA-256 hashes against original mtime to reset the mtimes. This works fine, but it seems that it should be an option built into Hugo to check whether the files it creates are exactly the same as the one it is about to replace and not update the file if so.

I wanted to check that there is not some option somewhere that I’ve missed that already does this before putting a feature request up on github.

Many thanks.

Can you use hugo deploy instead? It uses checksums instead of time and/or size.

Thanks for the pointer. The documentation for hugo deploy is a bit sparse, so I’ll have a play and see whether it does everything I need.

Didn’t really get on with hugo deploy, so I’ll add a feature request for a “don’t update the mtime of an unmodified file” option. In case anyone arrives here looking for a temporary solution, here is the python code for reverting the mtimes (and atimes) in the public folder after a hugo run:

#!/usr/bin/python3

import os.path
import hashlib
import glob
import pickle
import os
import sys

ext = (".html", ".xml")


if len(sys.argv) != 2 or not os.path.isdir(sys.argv[1]):
    print("Usage:", sys.argv[0], "PROJECT_ROOT")
    sys.exit(-1)
root = sys.argv[1]
filenames = glob.glob(os.path.join(root, "public/**"), recursive=True)
hashes = {}
pickle_file = os.path.join(root, "mtimes")
if os.path.exists(pickle_file):
    with open(pickle_file, "rb") as p:
        hashes = pickle.load(p)
for target in filenames:
    suffix = os.path.splitext(target)[1]
    if suffix not in ext:
        continue
    with open(target, "rb") as f:
        m = hashlib.sha256()
        m.update(f.read())
        digest = m.hexdigest()
        if digest in hashes:
            os.utime(target, ns=hashes[digest])
        else:
            stat = os.stat(target)
            hashes[digest] = (stat.st_atime_ns, stat.st_mtime_ns)
with open(pickle_file, "wb") as f:
    pickle.dump(hashes, f)

That’s unfortunate, and without understanding why, not terribly helpful.

The other approach to using checksums when syncing with S3 is rclone.

I didn’t manage to get it to work, which is almost certainly down to me misconfiguring something somewhere.

The solution I have in place – resetting mtimes of unchanged files after the hugo run – works fine. It would just be nicer if hugo had a flag to avoid overwriting files with identical content.

rclone with --checksum would definitely be another option.

I could be wrong, but I don’t see that happening when checksum options are (a) better, and (b) available with hugo deploy, rclone, and rsync (not applicable to S3 sync).

You’re probably right. Very much a nice to have rather than a need to have. It would, however, be pretty cheap in CPU time – the python script above runs in around 200ms on my fairly ancient machine, and python is waaaay slower that Go for this sort of thing. Not modifying the modification time of an unmodified file also seems like the “right” thing to do, even if behind a flag for optimisation purposes. I’ve put a feature request on github, anyway.

Many thanks for your help and suggestions.

Just hit the same problem, generated files having mtimes that don’t reflect .Lastmod is causing our HTTP last modified headers to be wrong.

The hugo deploy option doesn’t work for us, because we use a GitHub action to checkout the files and build the site, so the previous runs’ .html files aren’t there to be checksum-compared. (Also at this point we’d have to go fix them all some other way, for them to be carried forward correctly, and every trivial change to site design templates would reset all the timestamps, yes?)

So a Hugo feature to set mtime based on .Lastmod would be very welcome.

See https://github.com/gohugoio/hugo/issues/10842.