I’m trying to cook up a slugify function that would treat Cyrillic characters in a specific way… But first, a problem definition:
While technically, you are permitted to use Cyrillic in an URL, you will usually end up with a percent-encoded mess when people copy your URL out of the browser to paste it somewhere else. This is not readable with a naked eye and annoys people whom you rely on to spread your URL around. The traditional solution to the problem I have been using before Hugo was transliteration: Every Cyrillic letter has a commonly accepted Latin letter, or a combination of letters, that sounds roughly like the original Cyrillic sound, or close enough, that someone glancing at the URL will be able to tell what it meant to say. This way “привет” turns into “privet” and everyone’s happy.
Now, obviously, natively, Hugo does nothing of the sort – the anchorize
function converts Cyrillic to lowercase, but leaves Cyrillic letters in, which defeats the point, as these still turn into %D0%B0%D0%B1%D0%B2%D0%B3%D0%B4%D0%B5%D1
… I need an alternate solution, and while I could, in theory, go fix the anchorize
function itself and recompile, I don’t want to maintain my own fork of Hugo and I doubt this will be generally accepted as a pull request. And I can’t call anything external to do this for me either. I therefore have to somehow do this with Go templates only.
After some mucking around, I’ve been able to produce this horror, which I can’t help but think looks unbelievably crude:
{{- . | anchorize | replaceRE "а" "a" | replaceRE "б" "b" | replaceRE "в" "v" | replaceRE "г" "g" | replaceRE "д" "d" | replaceRE "е" "e" | replaceRE "ё" "yo" | replaceRE "ж" "zh" | replaceRE "з" "z" | replaceRE "и" "i" | replaceRE "й" "j" | replaceRE "к" "k" | replaceRE "л" "l" | replaceRE "м" "m" | replaceRE "н" "n" | replaceRE "о" "o" | replaceRE "п" "p" | replaceRE "р" "r" | replaceRE "с" "s" | replaceRE "т" "t" | replaceRE "у" "u" | replaceRE "ф" "f" | replaceRE "х" "kh" | replaceRE "ц" "ts" | replaceRE "ч" "ch" | replaceRE "ш" "sh" | replaceRE "щ" "shh" | replaceRE "(ъ|ь)" "" | replaceRE "ы" "y" | replaceRE "э" "ee" | replaceRE "ю" "yu" | replaceRE "я" "ya" -}}
I can’t even spread the code across multiple lines.
Now, could anyone more familiar with the way regexps work around here tell me, is it possible to do it in fewer regexps? Is there perhaps a way to iterate over the string instead?..
P.S. Yes, I realized I could use replace
instead of replaceRE
and it would at least be faster, but that’s not the point of this question.