Difference in auto-generated heading anchor names between previous versions and v0.60.0 or higher

Hello,

For headings with hyphen included, there is a difference between previous versions and v0.60.x in the name of automatically generated heading anchors.

Suppose we have a heading “Command-Gen-Instance”.

  • In v0.54.0 and v0.59.1 (Blackfriday), the anchor becomes “command-gen-instance”.
  • In v0.60.1 (Goldmark), the anchor becomes “commandgeninstance”.

I think the previous approach – retaining hyphen – is more generally used, and I prefer that way since my project has a bunch of links pointing to such named anchors.

Let me know if this is a valid issue and whether I need to submit a request for fix.

1 Like

+1

I’d also prefer having those hyphens. Can you open an issue on Goldmark repo and link it back here?

Opened an issue: https://github.com/yuin/goldmark/issues/46

If someone knows whether this stems from Hugo code rather than Goldmark, or if there’s an option for this, I’d appreciate the information.

@kaushalmodi, upon further investigation, this is not isolated to hyphen. It appears that Goldmark discards all non-alphanumeric characters when generating heading ID.

In Hugo v0.59.1 and lower, when generating auto heading ID, the following punctuation marks (and more) were taken into account as well as hyphen:

  • period
  • underscore
  • slash

However, in Hugo v0.60.*, these are excluded from the scope of ID generation.

For example, the ID for heading CLI_INIT v1.0.0 (Apr-21-2019) becomes cli-init-v1-0-0-apr-21-2019 by v0.59.1, whereas cliinit-v100-april212019 by 0.60.1.

Are there any other typographic elements that need to be considered when generating heading ID? And is there any spec or convention that we can use to support this request?

@bep, please share your thought on this. For documentation-heavy projects, I believe this can be a blocker to upgrade Hugo to v0.60.0 or higher.

2 Likes

Further testing and code tracing revealed that the current Goldmark implementation only takes into account one-byte code point (ASCII) while generating auto heading IDs, simply discarding extended latin characters (2 bytes) and other international characters (3 bytes). See related code below.

As a native user of Korean (the only script among CJK that requires spacing) who manages documentation site in English, this issue came up as a real blocker. I also believe that maintainers of multilingual sites will sympathize with me.

I’ll try to revise the Goldmark code and make PR for this issue. CC: @kaushalmodi, @bep

Ironically the Goldmark developer is Japanese (as he states in this issue). I do not have a multilingual site, and I still sympathize with what you are proposing.

PR created.

@kaushalmodi
The thing is, in Japanese or Chinese, there’s no spacing necessary in the script (even with sparse foreign characters, majority of them already transliterated) and the ID can be generated as-is most of the time. Also, if you check the developer’s blog, no auto heading ID is being used, so the support for auto heading ID might be of lower priority by personal preference.

Since the PR has been closed without merging, I’ll see if I can develop an extension for this. As I’m not a developer, it’ll take some time.
Apparently, the debate on auto-gen heading ID is an recurring topic without a clear solution, as seen in the discussion below:

I’ve wrote an extension:

and submitted an issue to Hugo.

The implementation of the extension is as simple as it can get, but using the extension seems to enable generation of auto heading IDs on a par with Blackfriday.

I’ve pondered over and tried out some other options such as the processing in ox-hugo, but decided to stick to the basic after realizing how complicated the requirement can be from i18n/l10n point of view (see https://github.com/gosimple/slug).

cc: @kaushalmodi, @bep

2 Likes