Further testing and code tracing revealed that the current Goldmark implementation only takes into account one-byte code point (ASCII) while generating auto heading IDs, simply discarding extended latin characters (2 bytes) and other international characters (3 bytes). See related code below.
As a native user of Korean (the only script among CJK that requires spacing) who manages documentation site in English, this issue came up as a real blocker. I also believe that maintainers of multilingual sites will sympathize with me.
I’ll try to revise the Goldmark code and make PR for this issue. CC: @kaushalmodi, @bep
The thing is, in Japanese or Chinese, there’s no spacing necessary in the script (even with sparse foreign characters, majority of them already transliterated) and the ID can be generated as-is most of the time. Also, if you check the developer’s blog, no auto heading ID is being used, so the support for auto heading ID might be of lower priority by personal preference.
Since the PR has been closed without merging, I’ll see if I can develop an extension for this. As I’m not a developer, it’ll take some time.
Apparently, the debate on auto-gen heading ID is an recurring topic without a clear solution, as seen in the discussion below: