Tag normalization for 0.123?

I admit, tags on my site are a mess. I have a page tagged “on call” (with a space, U+0020) and another page tagged “on call” (with a non-breaking space, U+00A0), for example. It wasn’t a problem before 0.123, though: both tags would be treated as the same tag, and everything would work.

In 0.123, it’s a whole different deal. These tags are suddenly not the same, and I have now two entries in my /tags/index.html: “On Call” and “On call” (of course). However, my /tags/on-call/index.html only has the first page now (the one with the space), because the other one doesn’t have the same tag, despite being the same slug.

I can, of course, understand the logic behind treating U+0020 and U+00A0 differently. However, I think that it’s a bug to have different tags conflict for the same slug. “Debian” and “debian” (and even “DEBIAN”) are still treated as the same tag in 0.123, so I would say “on call” and “on call” should be one tag, too, especially considering that, if they are not, one of them is left with no slug to be accessible at.

This is is the normalization in 0.123:

func NormalizePathStringBasic(s string) string {
	// All lower case.
	s = strings.ToLower(s)

	// Replace spaces with hyphens.
	s = strings.ReplaceAll(s, " ", "-")

	return s
}

Our old way of handling it was a … mess for several reasons, one of them being that you needed to consider a bunch of path rules/config when you wanted to find/link to a tag using its original value.

The above is a compromise, but it’s both simple and fast.

I have spent some years with computers, and I never experienced any editor inserting non-breaking spaces when i hit the space bar, and even so, if we’re going down this worm hole, we will soon start talking about tabs vs spaces.

This normalization procedure isn’t how the slug is built, though, is it? And I can imagine it messing up things in some languages. For example, “ΣΊΣΥΦΟΣ” is the uppercase for “Σίσυφος”, but strings.ToLower will give differing results on these. Languages are a mess, I know.

What I’m basically saying is it’s a bug to have two different tags fold into the same slug, as the list page ends up being overwritten and having only part of the list.

Your argument is that DEBIAN is different then Debian, which doesn’t make much sense to me when we’re talking about values that ultimately is meant to end up as part of a URL on the web.

You can still have DEBIAN and Debian in the title/slug if you really want to, but you need to make these values different in the front matter (e.g. debian-1 and debian-2).

No, no, my argument is exactly the opposite! I see how Debian and DEBIAN are the same, they fold to the same slug, that’s absolutely logical.

“on call” and “on call”, however, are not the same tag (in 0.123), yet they fold to the same slug - that’s a bug, I think. They either should be the same tag, or have different slugs.

OK, the non breaking space situation, you currently need to handle yourself. Feel free to create a proposal on GitHub, but try to be practical/concrete.

To close my argument here:

  • The normalization above is chosen carefully so it’s a super set of the stricter normalization used in end URLs.
  • The one exception to that is people who have configured --disablePathToLower.
  • But we do preserve the original case to be used in titles and --disablePathToLower etc. situations.
1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.