Goldmark lacks formatting signs for a few inline HTML tags. We can enable HTML to use them in our Markdown, but this is not a good solution, where security should be a concern — e.g. for themes or larger projects with many contributors. The site configuration parameter unsafe should be left in its default state unsafe = false.
In the last few month, I’ve found the following replacements very useful to inject some missing HTML tags after Goldmark has rendered the HTML. So far, there was no interference with other Markdown elements, shortcodes or attributes. The syntax dates back to a suggestion @jmooring made somewhere in this forum.
Every element is surrounded by the curly braces { and }. A special ASCII sign after the first brace indicates the replacement.
{^1} → <sup>1</sup>
{_2} → <sub>2</sub>
{#Key} → <kbd>Key</kbd>
{$variable} → <var>variable</var>
{!highlight} → <mark>highlight</mark>
{=Author} → <cite>Author</cite>
{+inserted} → <ins>inserted</ins>
These substitutions can be applied with Hugo’s replaceRE. I chained them together in one partial, which is called with .Content as input: {{ partial "content.html" .Content }}.
P.S.: After @salim pointed out a possible loophole in this approach, I changed the regex patterns to exclude angled brackets.
But this was not necessary, as the clarifying discussion here has shown. So now the template is again as it has been at first, but now I know better how it works.
Wow, thanks, looks like you’re right. This is the opposite of what I did hope to achieve. But these replacements do not work without safeHTML. Maybe we need to enhance the regex to exclude angled brackets. Would that do the trick?
Generally, you should really know what you’re doing when declaring user input as safeHTML, i.e. sanitize it properly. I don’t know if simply blocking angle brackets is enough… I guess OWASP’s XSS Filter Evasion Cheat Sheet is a good starting point, see e.g. section Character Escape Sequences.
I played around a little more with these regular expressions and noticed something odd. Hugo already seems to filter all tags when evaluating replaceRE.
Did you test your script attack with a recent Hugo version? Because I can’t get a tag through when using my original regex code.
Thanks, @jmooring, I haven’t tested this before. These replacements are meant to be used with the default configuration unsafe = false. Maybe the template should check this setting and issue an error or a warning? It wouldn’t make much sense to use these replacements and also allow for raw HTML.
And I have a question now, concerning Hugo’s workflow:
My impression is, Hugo uses its raw HTML check when unsafe = falseafter all content has been rendered and every replaceRE could have been run. Am I right about this? Then I could remove the check for the angled brackets again and rely on Hugo’s security check. This works on my installation, but I don’t know how far this nice feature dates back.
The unsafe = true/false configuration value sets the value of the yuin/goldmark html.WithUnsafe renderer option. Any manipulation of .Content occurs after goldmark has rendered the markdown to HTML.
Thanks again, then the steps are the other way around. With unsafe = false (default) Goldmark omits all HTML tags before replaceRE does its work. And I can rely on that check.
And with unsafe = true an attack doesn’t need replacements to embed script code. They can be placed anywhere like the inline tags. If raw HTML is enabled these replacements are of no use.