Find & Replace on escape codes

MattB · February 24, 2024, 10:29am

I need some advice on the best way to approach a problem I have.

I have the following range loop:

{{ range .Resources.Match "*.jpg" }}
    {{ $metadata := split (path.Base .Key) "_" }}
    ...
{{ }}

This iterates over the images and processes the name of the file as metadata based on splitting by an underscore. The first element in $metadata is a name (e.g. the filename is “A Moment to Contemplate_Matt Biggin_6_Intermediate.jpg”, and the first element would be “A Moment to Contemplate”)

The problem I’m trying to solve is that this filename is being returned from the range with the “special” characters escaped (i.e. “A%20Moment%20to%20Contemplate”). I need to get the original filename version back, minus the encoded spaces.

Now, I know I could just use replace to find and replace them but this isn’t the only character. I’m seeing “%21” and “%23”, and I sure there will be others, which leads me to need a solution which will take a string that have these encoding characters in and produce a rendered version of the string without them.

Is there a simple way to do this? It seems like a really common use-case.

My worst case scenario would be to have to scan all the images I have (about 5000) and get a definitive list of these characters, then produce some code which does the appropriate find and replace. That code would then have to be copied to about 12 places in my site which is kinda an ugly solution.

Thanks in advance and I really appreciate the help.

chrillek · February 24, 2024, 11:04am

Does this

help? Alternatively, using .Name instead of .Key returns the original filename (minus the path, though!).

Not to me, it doesn’t. Though I don’t know what you’re trying to do here, I try to stuff metadata I need either in front matter or into the image itself (aka EXIF).

MattB · February 24, 2024, 12:36pm

Thanks for the suggestions.

I’ve tried htmlUnescape already and that didn’t work.

I was also about to say that…

.Name doesn’t work because newly added behaviour means that this gives this result: a-moment-to-contemplate_matt-biggin_6_Intermediate.jpg, which is actually the problem I’ve been trying to fix in the first place.

But now I think about it I may be able to work with that. If it is reliable taking %20, %21, etc and replacing with “-” then perhaps I could just find and replace the “-” with a space and then capitalise the remaining words. Seems like a faff though.

Does Hugo support the creation arbitrary methods, so that I could define this logic in one place and then call where I need it?

chrillek · February 24, 2024, 12:41pm

Try a partial template.

I’m still on Hugo 0.122, and there the .Name is returned as it is in the file system, i.e. preserving spaces etc. Another reason to wait with updating until most of the kinks are ironed out.

MattB · February 24, 2024, 1:06pm

OK. That seems sensible. I’ll create one of those and put a single solution in there. At least then if I find a neat solution I only have to change it in one place.

Thanks for your help.

jmooring · February 24, 2024, 1:06pm

@MattB We don’t have a template function to decode a URL, but it would be pretty easy to add one if you can be patient. We broke your setup when we changed .Name to return a logical path.

MattB · February 24, 2024, 1:12pm

@jmooring Thanks. A template function to decode would be cool if you have the time. I’d raise a PR myself for that but my Go skills are non-existent (I’m a Java dev). For the time being I’m going to centralise this logic in a partial and craft a solution in there.

Thanks for your support.

In case you were wondering, the site in question is here: https://colchesterphotosoc.co.uk

I’ve had this site live for a couple of years now having taken on a re-write from a nasty deprecated Adobe product. This uses images from competitions as a database of sorts, with the data stored in the file names. The whole site is built from that data. Hugo has been a great tool and I’m hugely supportive of the time you put in to maintaining it.

jmooring · February 24, 2024, 1:24pm

Hold off on making any changes for a day. There may be a better way to do this.

jmooring · February 24, 2024, 2:48pm

Nevermind… the Key method was just marked for internal use. Although it looks like a file path today, that may become a UUID tomorrow.

Unfortunately that means you do not have access to the original file name, url encoded or not.

So we’re back to using Name, replacing hyphens with spaces then using the title function. And that’s not great either:

Join the AKC.jpg --> join-the-akc.jpg --> Join The Akc.jpg

Note that you can configure the title function to follow AP, CMOS, etc. See:
https://gohugo.io/getting-started/configuration/#configure-title-case

MattB · February 24, 2024, 2:55pm

Thanks for the info.

I’m going to have to radically rethink things then, because if I don’t have access to the filename through legitimate means then the premise I’ve based the site on is broken. This is a bit of pain if I’m being honest.

Without reliable access to the unmodified file name I’m rather stuck.

jmooring · February 24, 2024, 3:03pm

Another option:

{{ "join-the-akc.jpg" | humanize }} --> Join the akc.jpg

I’m sorry we broke your setup; the change was technically/structurally necessary. At some point in the future we may give you a way to get back to the original name, but not today.

If I were in your shoes, I would run exiftool recursively over the project, replacing the ImageDescription EXIF tag with the filename (less the extension). Even with 50k images it won’t take too long. You could run this before every build.

Then in your templates access the EXIF data.

MattB · February 24, 2024, 3:10pm

No apology required. Totally understand the need to change it.

I was thinking along the same lines. I was about to implement a shell script to create a JSON file with the names of all the images in and then base the site on that. Major bit of work but at least I’ll know it’ll work.

Your option of using exiftool is a good alternative and I’ll look into that.

jmooring · February 24, 2024, 3:14pm

JSON is a great idea too, but then you need to key it off of something. Like the filename. Oh wait, we don’t have access to that anymore.

So embedding is the way to go.

MattB · February 24, 2024, 3:19pm

I’ve done some digging, your exiftool solution is a great idea.

jmooring · February 24, 2024, 3:25pm

This runs recursively over your project directory, overwriting existing files – test before using!

exiftool -r -overwrite_original '-basename>ImageDescription' .

Starting with:

content/posts/post-1/images/Sunrise in Bryce Canyon.jpg

This gives me EXIF data like:

Image Description               : Sunrise in Bryce Canyon

I’d make a copy of your entire project directory, then run the command above to see how long it takes. Please post the results if you wouldn’t mind.

MattB · February 24, 2024, 3:29pm

Amazing. Thanks for that.

I was just digging in to find and awk to do similar. Didn’t realise that exiftool could do that itself.

jmooring · February 24, 2024, 3:31pm

Yeah, Phil Harvey has done an amazing job.

MattB · February 24, 2024, 3:40pm

  164 directories scanned
 2794 image files updated

Took about 5 mins and had plenty of tags to fix along the way. Did get some errors though and I’ll need to investigate those. All these images are submitted by our club members so the quality of the metadata is variable to say the least.

jmooring · February 24, 2024, 3:44pm

You could script this to skip files that haven’t been modified in the last N days. Run it once for the baseline (everything regardless of mod date), then run it again whenever you add images… or something like that.

jmooring · February 25, 2024, 3:15pm

Breaking news…

In the next release, presumably v0.123.4, we’re going to revert the recent change to the Name method on a Resource object. Going forward…

content/
└── foo/
    ├── images/
    │   └── Sunrise in Bryce Canyon.jpg
    └── index.md

{{ with .Resources.Get "images/Sunrise in Bryce Canyon.jpg" }}
  {{ .Name }} --> images/Sunrise in Bryce Canyon.jpg
{{ end }}

Sorry for the flip/flop.

Topic		Replies	Views
Quartz urls spaces hyphens relative path problem. Replace not working on index support	24	1739	October 22, 2021
How to image process, after getting images with .Resources.Match support images	3	148	June 22, 2024
Include \* in data/replace.toml support	3	224	January 22, 2024
replaceRE questions/issue support	0	555	May 12, 2020
.GetMatch using ** glob pattern is inconsistent support	2	234	February 16, 2024

Find & Replace on escape codes

Related topics