Indentation isn't stripped from minified HTML like it is for minified XML

It looks like Hugo uses https://github.com/tdewolff/minify to minify HTML and XML. Its doc seems to say that newlines are preserved (collapsing contiguous whitespace with a newline into one newline):

The XML minifier uses these minifications:

  • strip unnecessary whitespace and otherwise collapse it to one space (or newline if it originally contained a newline)

So for this XML template and the default minify options:

<?xml version="1.0"?>

<feed>

<p>{{.Title}}</p>

{{range .Pages}}
  <p>{{.Title}}</p>
{{end}}

</feed>

I expected output like this:

<?xml version="1.0"?>
<feed>
<p>Home</p>
<p>Foo</p>
</feed>

However, I got output like this instead:

<?xml version="1.0"?><feed><p>Home</p><p>Foo</p></feed>

OK, so I guess line breaks are “unnecessary whitespace”? If so, fair enough. However, minified HTML seems to still have its indentation:

<html lang=en-us>
<head>
    <meta charset=utf-8>
    <meta content=foo name=theme>

According to the https://github.com/tdewolff/minify HTML doc, it applies the same policy to HTML indentation:

The HTML5 minifier uses these minifications:

  • strip unnecessary whitespace and otherwise collapse it to one space (or newline if it originally contained a newline)

Indeed, if you invoke the minify library using https://go.tacodewolff.nl/minify and Hugo’s options and this HTML:

<html>
  <body>
    <p>Foo</p>
  </body>
</html>

you don’t get indentation:

<html>
<body>
<p>Foo</p>
</body>
</html>

Where does the minified HTML from Hugo get its indentation? Can it be disabled? Can it be enabled for XML too?

Where do you look at your HTML? If it’s the browser – well, that one tries to make it look nice for humans. If you see it in the raw file in your public directory, that’s another thing.

1 Like

In the Sources view in Safari:

I don’t think it reformats anything. That’s what the Elements view does:

For example, here’s the same webpage in the Sources tab without the --minify flag for Hugo:

As I said: Check out the raw HTML. That is the only relevant information. I just had a look at the “sources” output of a perfectly minimized HTML in Safari and it had added indentation as well as line breaks.

vs raw HTML

1 Like

Without --minify, I don’t see indentation or formatting for HTML when viewing with curl or nvim:

Probably more illustrative to show page 3 of curl output:

(No nice, consistent indentation added.)

mmh, When I use hugo --minify on your example site BOTH /rss.xml and /index.html are minified and I wouldn’t call the result having an indentation.

Yes there’s some space but:

  • the HTML file has two lines
  • and the XML even has some big blank part.

So seems that --minify could strip a little more sometimes :wink: BUT

I’m not sure if you really talk about minification but rather formatting ???

Hugo does NOT format the output NOR add/remove indentation - it comes straight from the Go template and your inputs.
And Minification actually should remove all unnecessary formatting

sry, maybe I just got lost to get “what’s the problem” here. with all that formatting, minify …


hope you don’t mean utilizing {{- -}}

1 Like

As others have pointed out, if you are referring to a published .html file, this isn’t true. You need to inspect the raw source, not the source as displayed by your browser’s dev tools. Or cat the file.

Try it:

git clone --single-branch -b hugo-forum-topic-53757 https://github.com/jmooring/hugo-testing hugo-forum-topic-53757
cd hugo-forum-topic-53757
hugo && cat public/index.html
hugo --minify && cat public/index.html

If you are referring to HTML encapsulated in an XML file (e.g., within the description element of an RSS feed), any indentation is removed, but minimal whitespace is retained (single space and/or newline) by design.

Try it using the same repository/branch as above:

hugo && cat public/index.xml
hugo --minify && cat public/index.xml

If you are referring to HTML encapsulated in an XML file (e.g., within the description element of an RSS feed), and the HTML is within a CDATA section, the content is passed through as-is, again, by design.

2 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.