Limit use of punctuation within taxonomy terms

This isn’t anything new, but came into focus while investigating an inquiry related to v0.123.0.

Introduction

There is nothing to prevent you from creating front matter like this:

tags:
  - a b
  - a-b
  - a,b
  - a:b
  - a?b

But there are some side effects of which you may not be aware:

  1. The first two terms are equivalent. Both are published to /pubic/tags/a-b with title “a b”.
  2. The next two terms collide. Both are published to /pubic/tags/ab. You can detect this by running Hugo with the --printPathWarnings command line flag. There are at least 21 other combinations of ASCII letters and punctuation that collide with these two.
  3. The fifth term contains a character that is not allowed in Windows file names, preventing you from overriding the term title or adding term metadata using content/taxonomy/term/_index.md. There are 9 characters that are disallowed by Windows: < > : " / \ | > *

Follow the rules below to avoid these side effects.

Naming rules

Taxonomy terms may contain Unicode letters, Unicode numbers, spaces, and any of the following characters:

  • _ (underscore)
  • - (hyphen)
  • # (hash)
  • + (plus)
  • . (period)
  • @ (at sign)
  • ~ (tilde)

Note that spaces and hyphens are equivalent, so these terms are equivalent:

  • a b
  • a-b

Although these two terms have the same URL (collide), we cannot disallow hyphens or spaces due to prevalence in the wild.

To add other characters to the term title, create a term page at content/taxonomy/term/_index.md.

Validate your site

You can efficiently validate your site’s taxonomy terms with something like:

layouts/partials/validate-taxonomy-terms.html
{{ if .IsHome }}
  {{ range $taxonomy, $_ := site.Taxonomies }}
    {{ range $term, $_ := . }}
      {{ if findRE `[^\pL\pN\s_\-\#\+\.\@\~]` $term }}
        {{ errorf `The term %q in taxonomy %q is invalid. Taxonomy terms may contain Unicode letters, Unicode numbers, spaces, and any of the following characters: "_", "-", "#", "+", ".", "@", and "~".` $term $taxonomy }}
      {{ end }}
    {{ end }}
  {{ end }}
{{ end }}

Call it from your base template; it runs once for each language.

The details

For anyone interested in the analysis details:

git clone --single-branch -b hugo-forum-topic-48638 https://github.com/jmooring/hugo-testing hugo-forum-topic-48638
cd hugo-forum-topic-48638
hugo server
7 Likes