Difference in auto-generated heading anchor names between previous versions and v0.60.0 or higher

jkboxomine · November 30, 2019, 3:58pm

Hello,

For headings with hyphen included, there is a difference between previous versions and v0.60.x in the name of automatically generated heading anchors.

Suppose we have a heading “Command-Gen-Instance”.

In v0.54.0 and v0.59.1 (Blackfriday), the anchor becomes “command-gen-instance”.
In v0.60.1 (Goldmark), the anchor becomes “commandgeninstance”.

I think the previous approach – retaining hyphen – is more generally used, and I prefer that way since my project has a bunch of links pointing to such named anchors.

Let me know if this is a valid issue and whether I need to submit a request for fix.

kaushalmodi · November 30, 2019, 4:03pm

+1

I’d also prefer having those hyphens. Can you open an issue on Goldmark repo and link it back here?

jkboxomine · November 30, 2019, 4:46pm

Opened an issue: https://github.com/yuin/goldmark/issues/46

If someone knows whether this stems from Hugo code rather than Goldmark, or if there’s an option for this, I’d appreciate the information.

jkboxomine · December 1, 2019, 7:22am

@kaushalmodi, upon further investigation, this is not isolated to hyphen. It appears that Goldmark discards all non-alphanumeric characters when generating heading ID.

github.com

yuin/goldmark/blob/9f9f8f0e5e9dc033c72c55ec846942ad50121837/parser/parser.go#L86-L93


		if util.IsAlphaNumeric(v) {
			if 'A' <= v && v <= 'Z' {
				v += 'a' - 'A'
			}
			result = append(result, v)
		} else if util.IsSpace(v) {
			result = append(result, '-')
		}

In Hugo v0.59.1 and lower, when generating auto heading ID, the following punctuation marks (and more) were taken into account as well as hyphen:

period
underscore
slash

However, in Hugo v0.60.*, these are excluded from the scope of ID generation.

For example, the ID for heading CLI_INIT v1.0.0 (Apr-21-2019) becomes cli-init-v1-0-0-apr-21-2019 by v0.59.1, whereas cliinit-v100-april212019 by 0.60.1.

Are there any other typographic elements that need to be considered when generating heading ID? And is there any spec or convention that we can use to support this request?

@bep, please share your thought on this. For documentation-heavy projects, I believe this can be a blocker to upgrade Hugo to v0.60.0 or higher.

jkboxomine · December 5, 2019, 3:48pm

Further testing and code tracing revealed that the current Goldmark implementation only takes into account one-byte code point (ASCII) while generating auto heading IDs, simply discarding extended latin characters (2 bytes) and other international characters (3 bytes). See related code below.

As a native user of Korean (the only script among CJK that requires spacing) who manages documentation site in English, this issue came up as a real blocker. I also believe that maintainers of multilingual sites will sympathize with me.

I’ll try to revise the Goldmark code and make PR for this issue. CC: @kaushalmodi, @bep

kaushalmodi · December 5, 2019, 3:57pm

Ironically the Goldmark developer is Japanese (as he states in this issue). I do not have a multilingual site, and I still sympathize with what you are proposing.

jkboxomine · December 5, 2019, 4:08pm

PR created.

@kaushalmodi
The thing is, in Japanese or Chinese, there’s no spacing necessary in the script (even with sparse foreign characters, majority of them already transliterated) and the ID can be generated as-is most of the time. Also, if you check the developer’s blog, no auto heading ID is being used, so the support for auto heading ID might be of lower priority by personal preference.

jkboxomine · December 6, 2019, 1:04am

Since the PR has been closed without merging, I’ll see if I can develop an extension for this. As I’m not a developer, it’ll take some time.
Apparently, the debate on auto-gen heading ID is an recurring topic without a clear solution, as seen in the discussion below:

jkboxomine · December 14, 2019, 1:01pm

I’ve wrote an extension:

and submitted an issue to Hugo.

The implementation of the extension is as simple as it can get, but using the extension seems to enable generation of auto heading IDs on a par with Blackfriday.

I’ve pondered over and tried out some other options such as the processing in ox-hugo, but decided to stick to the basic after realizing how complicated the requirement can be from i18n/l10n point of view (see https://github.com/gosimple/slug).

cc: @kaushalmodi, @bep

Topic		Replies	Views
Goldmark: Unicode characters removed in auto-generated title IDs support	6	920	December 24, 2019
Version 1.444.0 auto generated header ids break inline HTML support	1	88	February 25, 2025
Auto ID generation partially incompatible with allowable CSS selectors support	2	127	June 1, 2025
Maintaining the heading anchor IDs across languages support i18n	1	837	April 6, 2022
Hugo is generating CSS IDs for Heading Elements; Should it? support	11	3509	March 14, 2017

Difference in auto-generated heading anchor names between previous versions and v0.60.0 or higher

Related topics