Breaking changes in handling of dots in file names in 0.146+

I have the same issue as File names with dots are breaking my site in 0.146+, but that post is closed so I’m starting a new one here.

In my repo, lat and lng are stored inside the filename and it had worked fine previously. Since 0.146, the output is reduced to just one file as all of the filenames begin with “22.” with zero warnings from Hugo.

From the linked post, I know the suggested solution is to manually change the dots to something like a hyphen and set the slug field, but for something that has worked for a long time I think this is too much work for zero gain.

I’d appreciate an option to use the old 0.145 behaviour but AFAIK that isn’t likely to happen. I’ll probably stay on 0.145, but I think you should at least mention this breaking change in your changelog as it is really unexpected and better issue some kind of warning from Hugo so that users know what the problem is.

I was reading through your changelog at Release v0.146.0 · gohugoio/hugo · GitHub which linked to Reimplement and simplify Hugo's template system by bep · Pull Request #13541 · gohugoio/hugo · GitHub and both pages didn’t make it clear about this breaking change. I doubt many would even read your changelogs before upgrading but it’s even harder to find if it is hidden in multiple layers of links.

I know you have different priorities in mind but I think the communication on the breaking changes could have been done better. A warning message would have been very useful to help pinpoint the problem (I got a different warning message after the upgrade but it wasn’t related to this problem).

That’s on me.

I’ve been working on an upgrade guide, but life/work got in the way. I hope to have something ready later today, tomorrow at the latest.

We wouldn’t have made the change if there was nothing to gain by it.

With the new template system you can create templates to target any page, of any kind, in any language, in any output format, and at any level in the content structure without having to specify a layout and/or type in front matter.[1] To do that we needed symmetrical paths.

For example…

content:   content/en/docs/guides/installation/step-1.de.md
template:     layouts/docs/guides/installation/step-1/single.de.plain.txt

content:   content/en/docs/guides/installation/*
templates:    layouts/docs/guides/installation/single.de.plain.txt
              layouts/docs/guides/installation/list.de.plain.txt

In the above, for the single page template, neither guides nor installation needs to be a section; whether the directories contain _index.md files is irrelevant.


  1. Using the layout and/or type front matter fields to target a template can be labor intensive, prone to errors, and difficult to maintain. ↩︎

2 Likes

I guess the first thing to do is certainly to clearly state the scope of this change. This would require to decide whether Error when using resources with similar names · Issue #13596 · gohugoio/hugo · GitHub is a bug or intended behaviour. This decision would have the implication if the change would also apply to images, binary releases (and other files - those are just the given examples). And if this is working as intended, just document how to set the slug for a published artifact (by using a replacement pattern). And maybe add a method like Resources.Get that fullfils a simple purpose: Getting something from the (union-) filesystem without any magic. I won’t mind if it throws a warning on any call, or similar…

The issue you refer to was primarily a bug fix. We had to fix that bug, so certainly not zero gain.

We could maybe have fixed that bug with less breakage (not sure, but surely not zero brekage), but that would:

  1. Require me to put in some more hours of work
  2. Mean that we would be left with a setup which would be hard to define/understand.

an idea just out of the bat:

maybe use a more specific identifier to separate filenames from template detection logic.

example: .- which will be rarely seen in the wild. (and those usually are errornous names) may also be -.

  • release-0.1.5.-en.md
  • release-0.1.5.-md
  • release-0.1.5.deb

would allow to:

  • deprecate usage of dot only by first checking the new sep an falling back to current way
  • warn if the part after looks strange
  • much less conflicts
  • maybe a site parameter to choose old variant (maybe warn if used)
  • easier filename split logic

yeah. another big braking change. and just an idea when waking up…

just bash me if that’s a stupid black glowing💡

Lets say that there are

  • 900000 content files on the form foo.en.md in the wild.
  • 1000 content files on the form v1.2.3.md in the wild.

If one of them would need to do some renaming, who should we pick?

Looking back, we should have done a better job back when we started putting other identifiers in front of the extension. Some of it comes from my naive view of how a filename “should” look like:

  • Lower case name, no spaces.
  • The dot is a separator that separates the name and the extension.
  • And since path parser are smart about looking for the last dot, we cleverly found it a good idea to put other identifiers in front of the extension, separated with the dot separator.

This works fine in almost all cases, and it’s eaily avoidable if you know about it (rename a few files).

I got your point , said it maybe a bad idea

just thinking it might be better on the long run…

  • is easier to do those kind of braking in v0. in case
  • dots in filenames are quite common these days
  • renamin files will affect downloading them and you loose the common semver that has been there before

fine with your arguments and decision

I’m pretty sure that at least 99.99% of all filenames have only one dot in them. Anyhow – if we want to make v1.2.3.md and similar work, we need to find a way that’s not forcing all the others to do the renaming, because that’s not practical/possible.

1 Like

OK, Error when using resources with similar names · Issue #13596 · gohugoio/hugo · GitHub convinced me that we need to do “something” about this.

Well, as I see it, almost all (but that can be changed in the future) known permutations of file names should be known at build time. This is mostly the language and the output format. Additionally there is a list of file suffixes for Markdown, HTML a a few more (See Content formats) for the content.

Having that said it should be possible to generate patterns for files that can considered to be mangled / chopped up by dots. In turn this would imply that foo.en.md shouldn’t be considered a content file if en is not in either of disableLanguages, defaultContentLanguage and languages. If this is to strict, one could use a list of all valid language codes. The same applies to the output formats, those could be constructed from the buildin and configured as well.

I suspect that this logic is considered either to complicate to maintain or be to much a penalty in terms of performance. But it would be a cleaner solution than just assume that the dot semantics only applies to Hugo.

here i would say, a configured language by having it in languages it should be treated as content. even when it is disabled.

That’s what I tried to express: Only consider a language code as part of a file name if it’s in any of the configured languages (including disabled)…