A potential parsing bug when it comes to adding <pre> to HTML tags in Markdown?

I converted nearly 500 posts from Jekyll to Hugo and I encountered this in one of my posts.

Configure this to begin with:

markup:
  goldmark:
    renderer:
      unsafe: true

Then try adding this to a Markdown post:

<div>
    <p>Hmm</p>
</div>

Hugo will convert that to HTML and output “Hmm” in a paragraph. This is expected.

Now do the same with:

<div>

    <p>Hmm</p>
</div>

Hugo will wrap the paragraph in a <pre> tag and you’ll end up <p>Hmm</p> being output to the page.

I believe it’s due to how Hugo is parsing the Markdown. It sees 4 leading spaces and converts that into a <pre> because that’s the normal behavior of what to do for Markdown, however IMO that shouldn’t happen when it’s wrapped in HTML tags regardless of there being an extra new line or not. It doesn’t turn it into a <pre> when there’s no extra new lines.

What do you think?

This behavior follows the CommonMark spec. Test with the reference implementation:

https://spec.commonmark.org/dingus/?text=<div> %20%20%20%20<p>Hmm<%2Fp> <%2Fdiv>

Would it be worth exploring a Hugo option like minify: true which would remove all whitespace and minify the HTML before it reaches the Markdown processor?

That could help with folks moving from different systems that do this or something comparable by default to avoid this scenario from happening. For example with Jekyll, this wasn’t a problem. It didn’t convert it to <pre> tags.

I wrote a bunch of custom scripts to help convert 500+ posts from Jekyll to Hugo. I didn’t check every single post manually but I did manually check all posts with any HTML tag. If I didn’t do that as a sanity check it would have went undetected.

Really?

  • Removing all whitespace will break other things.
  • preparse html will be addinoal cost (it’s done by goldmark in one shot)

Hugo uses goldmark (Go) which is common mark compliant

Jekyll uses kramdown (Ruby) which is not

Both have additional settings to change some aspects. Choosen by end-users

Looks like we could first eleminate the differences between american and uk english :wink:

This for me is a one shot migration issue between Markdown dialects.

2 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.