After upgrading to v0.123.x, Hugo seems to be using anchorize rather than urlize when creating urls for term pages.
Given the following categories:
Things & Other things
Best & Worst of Times
Articles & News
This & That
Hugo v0.119.0 generates the following relative URLs
/categories/articles-news/
/categories/best-worst-of-times/
/categories/things-other-things/
/categories/this-that/
However v0.123.6 generates the following (note the extra double hyphens in place of &):
/categories/articles--news/
/categories/best--worst-of-times/
/categories/things--other-things/
/categories/this--that/
Reproduction:
git clone --single-branch -b anchorized-category-urls https://github.com/alanbreck/hugo-testing anchorized-category-urls
cd anchorized-category-urls
rm -rf public && hugo
This is expected/by design (I think).
Ok. This will result in broken URLs. The two options I can think of are:
- Accept the change and implement 301 redirects
- Add content files for each existing category which specifies a URL.
Is there another option that I’m not thinking of, though?
Either 301 redirects server-side, or generate content pages (as needed) with “aliases” in front matter. I guess on some sites it could affect every term, but on most sites it’s probably a subset.
Hang on a second. I misunderstood what was happening here.
This affects more than terms:
content/
└── posts/
└── The Second & Third Waves.md <-- front matter: tags: [Concept & Prototypes]
v0.122.0
public/
├── posts/
│ ├── the-second-third-waves/
│ │ └── index.html
│ └── index.html
├── tags/
│ ├── concept-prototypes/
│ │ └── index.html
│ └── index.html
└── index.html
v0.123.6
public/
├── posts/
│ ├── the-second--third-waves/ # URL has changed
│ │ └── index.html
│ └── index.html
├── tags/
│ ├── concept--prototypes/ # URL has changed
│ │ └── index.html
│ └── index.html
└── index.html
@bep Please confirm that this (see my previous comment) was an intentional change. It probably doesn’t affect that many URLs in the wild, but redirects are required.
If intentional, one concern I have is that it wasn’t very apparent that this was changed. I didn’t notice until diffing the build folders.
I haven’t upgraded yet. I haven’t had time to finish diffing the output folders to find all cases of the above, and other discrepancies.
Understood. Thanks for following up, @jmooring.
I may just be missing it, but it’s not apparent to me from looking at Preserve triple hyphens etc. in URLs. · Issue #10104 · gohugoio/hugo · GitHub that this behavior would extend to other characters (like &). Could this behavior be documented (maybe on the Content Management/Taxonomies page with examples of which characters result in an extra hyphen, and which do not (e.g. a .)?
Related to the issue of understanding what the precise behavior will be, does it use anchorize or urlize under the hood? At first glance, it appears that it does not since hyphens are added for whitespace, but . is preserved.
Please create an issue in the docs repository. I think this should be about Page.RelPermalink and Page.Permalink… what we’re talking about in this issue is not specific to taxonomy or term pages. The revised documentation should describe the relationship between the content’s file path (or logical path when the content is not backed by a file) and the published URL.
I’ve created https://github.com/gohugoio/hugoDocs/issues/2487.
Also note https://github.com/gohugoio/hugoDocs/issues/2307. The concepts referenced/described in the affected pages need to consolidated. It’s a bit of a mess right now, so I’d like to limit the proposed docs change to the Permalink/RelPermalink pages. #2307 is a few days of work, so it’s not going to happen tomorrow.
Thank you so much for creating that issue, and for maintaining the docs, @jmooring!