Static site: Preserve times of static files

I use hugo to build a static site that, after testing, I sync to the web server.

When generating the static site, hugo copies all files that do not need template processing to the destination (publishDir) ‘as is’. Is it possible to retain the file time stamps so these files are not unneccessarily synched to the web server?

These files include pictures, audio and video files which are quite big.

Normally Hugo is preserving timestamps and does not copy everything when something is changed while running hugo server. Not sure about how hugo handles it.

When you run hugo only to create the whole website it will of course run through ALL procedures, just in case something changed. The static files will be re-copied.

My approach would be to move files that prolong your hugo-runtime into some kind of folder that stays untouched in the creation process. Put it on a CDN or a subfolder that you save directly in the website without it being in the hugo repository or process.

It might be, that you are using a filesystem, that is not preserving file times or using file times in a local format. Like FAT. In this case programs will “think” it’s a changed date.

Hugo’s dynamic site generation (with hugo server) works with notify watches to detect what changes and act accordingly.

The static site generation processes templates and copies the rest of the assets/resources. It is this copying that should preserve timestamps.

% hugo -D

                   | EN  
-------------------+-----
  Pages            | 47  
  Paginator pages  |  0  
  Non-page files   | 68  
  Static files     |  4  
  Processed images |  0  
  Aliases          |  2  
  Sitemaps         |  0  
  Cleaned          |  0  

Total in 5760 ms
% ls -l public/lente/13_Lente.mp3 
-rw-r--r-- 1 jv jv 2693329 Jun 11 17:13 public/lente/13_Lente.mp3
% hugo -D

                   | EN  
-------------------+-----
  Pages            | 47  
  Paginator pages  |  0  
  Non-page files   | 68  
  Static files     |  4  
  Processed images |  0  
  Aliases          |  2  
  Sitemaps         |  0  
  Cleaned          |  0  

Total in 5760 ms
% ls -l public/lente/13_Lente.mp3
-rw-r--r-- 1 jv jv 2693329 Jun 11 19:09 public/lente/13_Lente.mp3

As you can see the timestamp changed, even though the file is not modified.

Which OS? I am unable to reproduce this behavior on Ubuntu.

Again: it could be the file system that is doing this. If you for instance have some form of encrypted file system running then copying the file might create a new file with that new date. Moving might keep dates, copying creates a new file.

I would approach the issue at the process of uploading the files. If you can use SSH to upload them with rsync, you should have parameters where rsync checks the checksum of the file and ignores the date. I am not sure if SFTP has these options.

@jmooring: Fedora 31.
@davidsneighbour: A normal, locally mounted filesystem.

Try the attached extremely minimal hugo setup.

% unzip ../minimal.zip 
Archive:  ../minimal.zip
  inflating: config.yaml             
   creating: content/
   creating: layouts/
 extracting: content/image.jpg       
  inflating: content/index.html      
   creating: layouts/_default/
  inflating: layouts/_default/baseof.html  
  inflating: layouts/_default/single.html  
% hugo --quiet
% ls -l public
total 8
-rw-r--r-- 1 jv jv 3010 Jun 12 10:14 image.jpg
-rw-r--r-- 1 jv jv  101 Jun 12 10:14 index.html
% sleep 60
% hugo --quiet
% ls -l public
total 8
-rw-r--r-- 1 jv jv 3010 Jun 12 10:15 image.jpg
-rw-r--r-- 1 jv jv  101 Jun 12 10:15 index.html

As you can see, the times of the files are updated. While this may be okay for the html files, image.jpg is copied ‘as-is’. My question was: does provide hugo a way to set the timestamps of the ‘as-is’ files to the original files. The answer seems ‘no’.

https://www.squirrel.nl/pub/xfer/uploads/3Cg9wUJppoSrdlMKsgOY1HWA.zip

I downloaded and ran your sample and it indeed updated the time stamp.
Then I moved the image.jpg into the static directory and upon running hugo it’s timestamp was about 4 hours ago.

I did not dive deeper, but you are using an index.html as opposed to index.md or _index.md. The behaviour might change between the page types. Also did not test what happens in page bundles at a deeper structure.

Put your static large files into the static directory. I was assuming you are doing that already. Probably Hugo assumes that things in the content directory are changing in between.

1 Like

This behavior is troubling. I assumed that resources within page bundles were copied to public/ with something equivalent to the Linux cp --preserve=mode,ownership,timestamps command.

If you use rsync to synchronize from development to staging to production, you can ignore timestamps and instead hash the files on both ends. For a small site with small files the overhead is insignificant.

But for a large site with large files, and for those who cannot hash at both ends, the existing behavior is not ideal.

@sciurius, thanks for posting this. Page bundles are great for encapsulating content and processing resources, but going forward I’ll probably go back to using static/ for large-ish files.

It seems that static files are not updated, so they keep the time stamps of initial copy.
Static page bundle files are overwritten each time, so they get updated time stamps.

While the approach for static files is better than for the static page bundle files, I think explicitly setting the time stamps similar to cp --preserve would be better.

Page bundles are relatively new (since 0.32), perhaps noone has yet considered to treat static page bundle files like static files?

@davidsneighbour: I first encountered this problem with page bundles at a deeper structure.

I think from a developer point of view (I am not Hugos developer nor do I know how Golang is working) I would be very afraid to decide what file endings might be “cacheable” (in sense of “no need to recopy on recreation”) and the way people with “ordinary” use-cases use Hugo and might run into issues with any change related to that. The best (imho) might be some kind of configuration parameter what file endings in page bundles can be cached on hugo.

I’ll tag @bep here :wink: duckandrun

Hrmpf.

This has been discussed before, I’ll take a brief recap.

  • For files in /static, Hugo tries to preserve timestamps.
  • That said, I would not use that as the only “has the file changed strategy”
  • Take hugo deploy as one example of how it should be done: It uses MD5 hashes from the server (e.g. Amazon S3) and compares them to your /public and only uploads what has changed.
  • You can get similar with rsync --checksum
1 Like

Yep. What I said before. I really think the issue should be solved in the transfer itself. Thanks @bep.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.