One of my categories is named “Le-carré,” but the link ends up being generated like this:
categories/le-carr%C3%A9
And not working. Is there an easy fix for this that I’m overlooking?
One of my categories is named “Le-carré,” but the link ends up being generated like this:
categories/le-carr%C3%A9
And not working. Is there an easy fix for this that I’m overlooking?
“Latin Small Letter E with Acute” is 0xC3 0xA9 in UTF-8, so the URL encoding has the right escape for that. What were you expecting and what are you seeing (in other words, how is it working versus what you expected).
I was expecting the category link for “le-carré” to work like the others do. Instead it 404s. You can see the category list here:
There is a directory created under “categories” named “le-carré,” but the link does not point to it. (It’s not a case problem.)
Hello @joncgoodwin,
Are you running Mac OS X? If so, you are a likely victim of HFS Plus file system’s insistence to store the “é” (U+00E9) character in Normal Form Decomposed (NFD) mode, i.e. as “e” + " ́" (U+0301).
As an example, compare these two:
Is your web server also running on Mac OS X? Or do you upload the static web site to a Linux server?
Depending on your situation, I think you may either run convmv
on the Linux web server:
Convert all files in a directory from NFD to NFC:
convmv -r -f utf8 -t utf8 --nfc --notest .
Or, you may edit your .md files and change the category name from “le-carré” (NFC) to “le-carré” (NFD).
Hope this helps!
Anthony
Thanks. I’m sure this is the problem. My linux server does not have convmv, and I don’t think I can install it there. I could install it via brew on my mac, but would running this command on the markdown source files fix the problem (or cause another)?
Or would a sed or perl one-liner solution be the way to go?
How about something like this?
You can use rsync’s --iconv option to convert between UTF-8 NFC & NFD, at least if you’re on a Mac. There is a special utf-8-mac character set that stands for UTF-8 NFD. So to copy files from your Mac to your NAS, you’d need to run something like:
rsync -a --iconv=utf-8-mac,utf-8 localdir/ mynas:remotedir/
This will convert all the local filenames from UTF-8 NFD to UTF-8 NFC on the remote server. The files’ contents won’t be affected.
After updating to a later version of rsync, this fixed it. Thanks again; I was completely unaware of this UTF-8 Mac issue.
You are very welcome. This strange phenomenon on Mac OS X is news to me too: I learned of this merely a week ago while reading “Git v2.2.1 (security release) available” on Linux Weekly News:
Most of the discussions that followed were about “Filename mangling considered harmful”, and the issue of HFS+ decomposing Unicode characters was brought up.
More background information is available in this “rant” too:
Note (and random ideas) to self and to fellow developers:
Good idea about the troubleshooting section (I wouldn’t call this issue with accented characters on OSX a FAQ …).
Ran into this issue with a taxonomy entry of サービス. The ビ character is losing its little upper right part, and rendering as ヒ.
Even though I’m running rsync 3 (from brew) with the --iconf switch, it’s not converting the name on the server side. I’ll need to look into it more.
For reference, a couple things I learned:
--iconv=LOCAL,REMOTE
always must be specified in that ordericonv --list
to see what’s availableThe missing accents are “my fault”.
This is a long story:
You pushed a fix, does that go in the frontmatter of whatever post whose categories you’re trying to protect?
Well, I posted a fix that removed unicode accents in url/path in categories for sections and taxonomies. This didn’t work too well with Japanese … So I will add a fix that make the previous fix optional (default off).
Hi, quick question, I searched through the documentation but could not find an option to remove unicode characters from urls. Was it implemented? If yes, how can I turn it on? I’m using a Mac and I’m facing the NFC vs NFD problem.
There is no general option for that, but have a look at
Thanks for taking the time to answer my question. I see what the code is doing, but as I’m lazy, I took the rsync route described above and it works as intended. My accented categories links are now working properly on the remote server.
Old topic but actual issue I just solved with the following settings:
removePathAccents = true
to your config fileurlize
in your template file (this was required in my case to get correct breadcrumbs)Related resources: