My workflow is to develop a site using hugo server then stage it by running hugo and syncing public with an S3 bucket linked to CloudFront. When everyone agrees everything in staging is good, the staging bucket is synced directly to a production bucket.
AWS s3 sync is based on file size and file modification time only. Thus I would prefer files that are unchanged after a Hugo run do not have their mtime updated. At the moment I am using a Python script that maintains a dictionary of SHA-256 hashes against original mtime to reset the mtimes. This works fine, but it seems that it should be an option built into Hugo to check whether the files it creates are exactly the same as the one it is about to replace and not update the file if so.
I wanted to check that there is not some option somewhere that I’ve missed that already does this before putting a feature request up on github.
Didn’t really get on with hugo deploy, so I’ll add a feature request for a “don’t update the mtime of an unmodified file” option. In case anyone arrives here looking for a temporary solution, here is the python code for reverting the mtimes (and atimes) in the public folder after a hugo run:
#!/usr/bin/python3
import os.path
import hashlib
import glob
import pickle
import os
import sys
ext = (".html", ".xml")
if len(sys.argv) != 2 or not os.path.isdir(sys.argv[1]):
print("Usage:", sys.argv[0], "PROJECT_ROOT")
sys.exit(-1)
root = sys.argv[1]
filenames = glob.glob(os.path.join(root, "public/**"), recursive=True)
hashes = {}
pickle_file = os.path.join(root, "mtimes")
if os.path.exists(pickle_file):
with open(pickle_file, "rb") as p:
hashes = pickle.load(p)
for target in filenames:
suffix = os.path.splitext(target)[1]
if suffix not in ext:
continue
with open(target, "rb") as f:
m = hashlib.sha256()
m.update(f.read())
digest = m.hexdigest()
if digest in hashes:
os.utime(target, ns=hashes[digest])
else:
stat = os.stat(target)
hashes[digest] = (stat.st_atime_ns, stat.st_mtime_ns)
with open(pickle_file, "wb") as f:
pickle.dump(hashes, f)
I didn’t manage to get it to work, which is almost certainly down to me misconfiguring something somewhere.
The solution I have in place – resetting mtimes of unchanged files after the hugo run – works fine. It would just be nicer if hugo had a flag to avoid overwriting files with identical content.
rclone with --checksum would definitely be another option.
I could be wrong, but I don’t see that happening when checksum options are (a) better, and (b) available with hugo deploy, rclone, and rsync (not applicable to S3 sync).
You’re probably right. Very much a nice to have rather than a need to have. It would, however, be pretty cheap in CPU time – the python script above runs in around 200ms on my fairly ancient machine, and python is waaaay slower that Go for this sort of thing. Not modifying the modification time of an unmodified file also seems like the “right” thing to do, even if behind a flag for optimisation purposes. I’ve put a feature request on github, anyway.