[bug/feature] Hugo wrong support non-acsii symbols in url?

TL;DR; if url contains non-ascii chars, .URL and .Permalink gives different output result with the same “url” in .md file.

url: “/badÅŠ”
.URL wrong, /bad%c3%85%c5%a0
Permalink - good /bad%C3%A5%C5%A1

Here is simple .md


title: "Test Irl"
date: 2017-09-14T10:19:19+03:00
draft: false
url: “/badÅŠ”

Hello, here is test

if I try to show this post thru “hugo server” - i get 404. And i cant see whats wrong because hugo does not record requests.
if I push public directory after “hugo” to common web server - i’ve got the what i need

in public

$ ls -1 |grep ba
badåš

in web logs i see

127.0.0.1 - - [14/Sep/2017 10:23:41] “GET /bad%c3%85%c5%a0 HTTP/1.1” 301 -
127.0.0.1 - - [14/Sep/2017 10:23:41] “GET /bad%c3%85%c5%a0/ HTTP/1.1” 200 -

–preserveTaxonomyNames does not have any effect too …

whats i do wrong?

Hey, @kiltum I’m happy to tackle this (I’m the theme creator), though I’m not so sure it’s a Ananke issue. Can you point me to a repo and I’ll check it out. You can also file an issue directly on the Ananke repo https://github.com/budparr/gohugo-theme-ananke/issues

Hello @budparr ! I just check it again. no separate repo need :wink:

[kiltum@mbook test]$ hugo version
Hugo Static Site Generator v0.27.1 darwin/amd64 BuildDate: 2017-09-13T15:32:10+03:00
[kiltum@mbook test]$ hugo new site quickstart
Congratulations! Your new Hugo site is created in /Users/kiltum/test/quickstart.
....
[kiltum@mbook test]$ cd quickstart
[kiltum@mbook quickstart]$ git init
Initialized empty Git repository in /Users/kiltum/test/quickstart/.git/
[kiltum@mbook quickstart|master [?]]$ git submodule add https://github.com/budparr/gohugo-theme-ananke.git themes/ananke
Cloning into '/Users/kiltum/test/quickstart/themes/ananke'...
remote: Counting objects: 791, done.
remote: Total 791 (delta 0), reused 0 (delta 0), pack-reused 791
Receiving objects: 100% (791/791), 2.37 MiB | 1.32 MiB/s, done.
Resolving deltas: 100% (401/401), done.
[kiltum@mbook quickstart|master [+?]]$ echo 'theme = "ananke"' >> config.toml
[kiltum@mbook quickstart|master [+?]]$ hugo new posts/my-first-post.md
/Users/kiltum/test/quickstart/content/posts/my-first-post.md created
[kiltum@mbook quickstart|master [+?]]$ cat > content/posts/my-first-post.md
---
title: "Test Irl"
date: 2017-09-14T10:19:19+03:00
draft: false
url: “/badÅŠ”
---
(^d here)
[kiltum@mbook quickstart|master [+?]]$ hugo server
...

Now open browser to localhost:1313 and click to “Test Irl” - you got 404 error. But if you change theme to another (i try minimo & cocoa-eh) - all was ok.

Actually i found problem - by some reasons in summary-with-image.html:3 you use .URL, so result html is bad

<a class="db pv4 ph3 ph0-l no-underline dark-gray dim" href="%e2%80%9c/bad%c3%85%c5%a0%e2%80%9d">

And final URL in browser is http://localhost:1313/“/badÅŠ”

i am newbie in hugo, but if i change .URL to .Permalink as

<a class="db pv4 ph3 ph0-l no-underline dark-gray dim" href="{{ .Permalink }}">

html code come to next form:

href="http://localhost:1313//bad%C3%A5%C5%A1/"

and all work nice.

Looks like situation comes to “i found HUGO server bug/feature”

i remove “” around url in .md file and switch back to .URL in code

now in html i see

/bad%c3%85%c5%a0

not works

but .Permalink give me

/bad%C3%A5%C5%A1

that works …

i just grab all files in one piece at https://github.com/kiltum/hugotest

i add “work” as working variant of url conversion …

can you confirm this?

What is the “IRL” feature by the way that Hugo supposedly should support? And that is a bug since it doesn’t work correctly?

The only meaning of IRL that I know is ‘in real life’, and Google does not help much further.

If you want help/input from as much people as possible, it will probably help to clarify terms like IRL.

May be i use term incorrectly, but IRL in my world is International Resource Locator.

in other words - non ACSII symbols in URL. Somebody know it as “encoded url”. example - in posts above

next step - IDN (international domain name), but they are declare whole URL contains non-acsii chars and actually not in this case, because translated to acsii

Thanks for clarifying, I did not know that. Now I have some words to search for. :slight_smile: I do see that there’s an open issue about IDN support and non-ASCII characters. But it hasn’t gained traction yet (so also not confirmed yet).

I learned from this topic that there’s a removePathAccents setting for your config.toml file that might help. Although the use case discussed in that topic isn’t exactly the same as the one you have.
I couldn’t find documentation for that variable, but a Google search did turn up this article. Based on the description there I’m not sure if it helps in this case (but always worth a try I think).

Other than that I don’t know how to use non-ASCII characters in Hugo URLs.

No, actually a problem is a little strong, but i cant decrease a area of error/bug

with non-acsii chars in url

a) hugo & hugo server work differently
b) .URL and .Permalink work differently

differently = output is different for the same source

non-acsii characters is nice for non-english people :wink: