Inconsistent HTML/XML Escaping

I noticed a case where HTML escaping is applied inconsistently that I don’t understand. I’m curious if it’s a bug or if someone can shed some light on the observed behavior.

I have an layout template for an Atom feed. Removing all the irrelevant bits, it looks like this

<content type="html"><![CDATA[{{ .Content }}]]></content>

and the rendered XML looks like this

<content type="html">&lt;![CDATA[<p>Paragraph</p>]]></content>

Notice the initial < is escaped but the final > is not. Why is that? I know I can work around it with printf, I’m just curious about the intended behavior.

Relatedly, why do we need the printf workaround in some cases but not others? Consider this inside of an XML layout template:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

which renders as

&lt;?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

Notice one tag gets escaped while the other doesn’t. What gives?

Each output format (e.g., html, rss) has a boolean parameter named isPlainText.

  • When this is true, Hugo uses Go’s text/template package to generate textual output.
  • When this is false, Hugo uses Go’s html/template package to generate HTML output made safe against code injection via contextual autoescaping.

The isPlainText parameter is false for predefined output formats such as html and rss. It is also false for custom output formats unless explicitly set to true in your site configuration.

So, for your atom output format, unless you have explicitly set isPlainText to true, Go’s html/template package performs contextual autoescaping.

OK, that’s great, but why is this list.atom template:

<?a b?>

Rendered to this?

&lt;?a b?>

See https://github.com/golang/go/issues/30168.

Your options:

  1. Pass the value through the safeHTML function, using print or printf as needed. This is recommended and safe.

    {{ "<?a b?>" | safeHTML }}
    {{ print "<?a b?>" | safeHTML }}
    {{ printf "<?a b?>" | safeHTML }}
    
  2. Change your site configuration, explicitly setting isPlainText to true for the atom output format. This causes Hugo to use Go’s text/template package where nothing is contextually escaped. This isn’t a great idea unless you control all of the site’s content and data.

    [mediaTypes.'application/atom+xml']
    suffixes = ['atom']
      
    [outputFormats.atom]
    baseName = 'feed'
    isPlainText = true
    mediaType = 'application/atom+xml'
    

Thanks for the link. Looks like the answer to the question is that Go’s html/template is strictly for HTML5 and doesn’t fully support XML, SGML, or older HTML. Anything it sees that it doesn’t recognize as HTML5 gets escaped. <? ?> and <![CDATA[ ]]> in this case.

The reason only the open brackets are escaped is because technically that’s all that’s necessary for it to parse correctly. Apparently the closing bracket is ignored if it wasn’t preceded by an opening bracket.

Yeah, well, then they should document it accordingly and close the issue.