I noticed that {{ .WordCount }} is including tags as words.
For example this post reports back 5 words:
### Hello world
### Interesting
It’s counting each ### as 1 word. I noticed it does the same for other tags too.
Is this worth opening up a bug report and is there a way to configure Hugo to parse the content into HTML, then strip all of the tags and perform a word count on that result? That could give a more accurate count.
Weirdly enough {{ .Content | plainify | countwords }} reports 5 but plainify is supposed to remove HTML tags. It transforms ### into # but doesn’t remove it and I couldn’t find a reference in the docs to convert Markdown to HTML through a function call or a way to strip Markdown syntax as a different function call.
What mechanism is allowing the word count to be 5 in the above example when there’s 3 words and 2 headings? If I remove both ### then it produces 3 words. If step 1 happens from your 3 step workflow then ### should get generated into <h3></h3> and then removed right?
I am unable to reproduce the problem as described. Try it:
git clone --single-branch -b hugo-forum-topic-50748 https://github.com/jmooring/hugo-testing hugo-forum-topic-50748
cd hugo-forum-topic-50748
hugo server
The question now is, how can we get things to report the correct word count while using this hook? This is while using v0.128.2 (linux amd64) for reference.
Ah ok. Is that intended or a potential edge case oversight about how hooks apply to WordCount? I’m wondering if I should open an issue.
I was able to update your example to {{ sub .WordCount (findRE (?s)<h3.?>.? .Content | len) }} which works for my specific case since I only add the named anchors to H3 headings. Do you happen to know if there’s a more efficient way to do that since this requires scanning all of the .Content again?
Maybe some way to augment .Fragments.HeadingsMap to only return H3s not all of headings?