Iterate through a page headings

I’m wondering if headings in a page can be accessed and iterated over from within a shortcode.
Imagine a content page like below:

## Introduction

One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin.

## My Heading

He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections. The bedding was hardly able to cover it and seemed ready to slide off any moment.

### My Subheading

A collection of textile samples lay spread out on the table - Samsa was a travelling salesman

Is it possible to access a dictionary of headings and sub-headings of this page?
I’m guessing .Page.Contents would be a good place to start and maybe a where filter to extract headings and construct a dictionary?
I also see that there’s a .Page.TableOfContents variable that seems to be exactly what I need. But it seems to be the rendered html version, What I need is a nested dictionary of headings and sub headings within a page.

There’s no easy way to do it, but you might have a look at this:
https://discourse.gohugo.io/t/splitting-content-into-sections-based-on-header-level/33749/6

.Contents is already HTML. So you’d have to search for <h(\d)>([^<]+)</h\1> with a regular expression. Then the text of the headings would be accessible in the second capturing group, their level in the first one. However, afaik, a dictionary is a key-value map, so I don’t see how that will help you to retain the hierarchical structure of your headlines.

Like .Contents, same difference.

What do you want to achieve?

The variable is called .Content (there is no s in the end).

1 Like

I made some progress getting the headings using findRe . For some reason content was not available to me but I was able to get .Page.RawContent
Code below gets the headings and adds them to a list:

{{ $headings := slice }}

{{ range (findRE `\n#+([^{\n]+)` .Page.RawContent)}}
    {{ $headings = $headings | append (anchorize (trim . "\n# ") )}}
{{end}}

<h{{ $.Level }} id="{{ index $headings 1}}">{{ $.Text | safeHTML }} <a href="#{{ index $headings 1}}"></a></h{{ $.Level }}>

Answering your broader question of what I’m trying to do. I’m trying to override the way headingID is generated by Hugo, by using a markup hook for headings.
I need heading IDs to be in English even if the heading text is not English. For example for a given page in French, I can construct a list of this page headings (French headings) using the code above. I’m hoping to fetch the English version of the same page and do the same, then correlate the order of headings and apply English IDs to the French headings.
The part that I’m stuck now is getting the English page, seems like that scope is not accessible in the heading hooks.

This will not find a heading on the first line of your file because of the leading \n. Also, the ‘#’ marker(s) are followed by a space in markup, which your RE gobbles up in the capturing group. Is that intended?

You’re right. The following seems to be working better:
#+ ([^{\n]+)[ \n]
But is there a way to retrieve the capturing group in Hugo? As far as I know that’s not supported?

You’ll probably have do too same tricks with ReplaceRE.

For an example using a captured group in the replaceRE function see:

Not sure how you could retrieve the captured group with findRE.

Also there are a few more topics in the forum about your question, look them up perhaps you find something else.

You’d need something like a match list/map/dictionary that you can access. Go seems to have the required functionality, but it is apparently not exposed in Hugo.

There’s an example on how to work with capturing groups here:

That’s clumsy (to use a friendly term). A more complete support for REs in Hugo would probably be nice.

I see that you opened:

So you will most likely get an answer there by the maintainer or perhaps someone else might look into the issue.