CJK languages (and others like Thai for instance) offer some problems to us devs with for instance being word-counted or even fitting into a nice layout. This is why I was thinking about how to find out if a string contains CJK characters.
The solution lies within the ranges of the UTF-8 definition. There is a range for CJK that starts at U+4E00 and ends at U+9FFF. My general idea is to match the tested string against that range. The following layout n func/isCJK.html
will do that:
{{ $isCJK := false }}
{{ $matches := findRE "[\u4E00-\u9FFF]" . }}
{{ if gt (len $matches) 0 }}
{{ $isCJK = true }}
{{ end }}
{{ return $isCJK }}
Testing:
{{ $isCJK := partialCached "func/isCJK.html" "丹为" "丹为" }}
{{ $isCJK }} <-- true
{{ $isCJK := partialCached "func/isCJK.html" "blafasel" "blafasel" }}
{{ $isCJK }} <-- false
Then it can be used for instance with
{{ .WordCount }}{{ if partialCached "func/isCJK.html" .Content .Content }} Characters{{ else }} Words{{ end }}
Things to keep in mind:
- The expression matches any string with a single (or more) CJK character
- The Thai Unicode block is at U+0E00 to U+0E7F, so adding a parameter to set what ranges to test against is a nice extension, or maybe a dict that connects languages to ranges so we can test agains a language code…
- not sure if this partial returns
"true"
ortrue
- typecasting to the rescue!