Categories with accented characters

One of my categories is named “Le-carré,” but the link ends up being generated like this:

categories/le-carr%C3%A9

And not working. Is there an easy fix for this that I’m overlooking?

“Latin Small Letter E with Acute” is 0xC3 0xA9 in UTF-8, so the URL encoding has the right escape for that. What were you expecting and what are you seeing (in other words, how is it working versus what you expected).

I was expecting the category link for “le-carré” to work like the others do. Instead it 404s. You can see the category list here:

There is a directory created under “categories” named “le-carré,” but the link does not point to it. (It’s not a case problem.)

Hello @joncgoodwin,

Are you running Mac OS X? If so, you are a likely victim of HFS Plus file system’s insistence to store the “é” (U+00E9) character in Normal Form Decomposed (NFD) mode, i.e. as “e” + " ́" (U+0301).

As an example, compare these two:

Is your web server also running on Mac OS X? Or do you upload the static web site to a Linux server?

Depending on your situation, I think you may either run convmv on the Linux web server:

Convert all files in a directory from NFD to NFC:
convmv -r -f utf8 -t utf8 --nfc --notest .

– from https://gist.github.com/JamesChevalier/8448512

Or, you may edit your .md files and change the category name from “le-carré” (NFC) to “le-carré” (NFD).

Hope this helps!

Anthony

Thanks. I’m sure this is the problem. My linux server does not have convmv, and I don’t think I can install it there. I could install it via brew on my mac, but would running this command on the markdown source files fix the problem (or cause another)?

Or would a sed or perl one-liner solution be the way to go?

How about something like this? :slight_smile:

You can use rsync’s --iconv option to convert between UTF-8 NFC & NFD, at least if you’re on a Mac. There is a special utf-8-mac character set that stands for UTF-8 NFD. So to copy files from your Mac to your NAS, you’d need to run something like:

rsync -a --iconv=utf-8-mac,utf-8 localdir/ mynas:remotedir/

This will convert all the local filenames from UTF-8 NFD to UTF-8 NFC on the remote server. The files’ contents won’t be affected.

After updating to a later version of rsync, this fixed it. Thanks again; I was completely unaware of this UTF-8 Mac issue.

You are very welcome. This strange phenomenon on Mac OS X is news to me too: I learned of this merely a week ago while reading “Git v2.2.1 (security release) available” on Linux Weekly News:

Most of the discussions that followed were about “Filename mangling considered harmful”, and the issue of HFS+ decomposing Unicode characters was brought up.

More background information is available in this “rant” too:


Note (and random ideas) to self and to fellow developers:

  • A new Troubleshooting or FAQ section in the Hugo docs documenting abnormalities/gotchas like this.
  • Perhaps Hugo should have a user-configurable option to strip off the accents and convert them to plain ASCII from the path names of categories, e.g. “Le-carré” to “Le-carre”. Default on or off? :slight_smile:
  • In that case, Hugo may also need to deal with possible collision, say, the user has both “Le-carré” and “Le-carrè” and both map to “Le-carre”, so one of them must be renamed as “Le-carre1”?
1 Like

Good idea about the troubleshooting section (I wouldn’t call this issue with accented characters on OSX a FAQ …).

Ran into this issue with a taxonomy entry of サービス. The ビ character is losing its little upper right part, and rendering as ヒ.

Even though I’m running rsync 3 (from brew) with the --iconf switch, it’s not converting the name on the server side. I’ll need to look into it more.

For reference, a couple things I learned:

  • --iconv=LOCAL,REMOTE always must be specified in that order
  • you can use iconv --list to see what’s available

The missing accents are “my fault”.

This is a long story:

You pushed a fix, does that go in the frontmatter of whatever post whose categories you’re trying to protect?

Well, I posted a fix that removed unicode accents in url/path in categories for sections and taxonomies. This didn’t work too well with Japanese … So I will add a fix that make the previous fix optional (default off).

Hi, quick question, I searched through the documentation but could not find an option to remove unicode characters from urls. Was it implemented? If yes, how can I turn it on? I’m using a Mac and I’m facing the NFC vs NFD problem.

There is no general option for that, but have a look at

Thanks for taking the time to answer my question. I see what the code is doing, but as I’m lazy, I took the rsync route described above and it works as intended. My accented categories links are now working properly on the remote server.