Is HTML minification broken?

This was working fine recently, now today it stopped working, first using version 0.86 extended, as well as after a Manjaro update today which bumped it to 0.87.0.

This seems to be the case for me whether I continue to use the minifyOutput = true method I prefer in config, or whether I use the --minify command line flag (which I do not prefer, because I’d like to test my config works before I ever push to GitLab).

The most that either method will do today is remove blank lines, but it will not strip out all whitespace in the generated HTML like it did before.

You can see my command line flag in my GitLab CI config here (I’ve also run that command locally, even though I do not build locally for GitLab Pages), and my preferred Hugo config alternative here. You can pull up the live site to view built source here. My preferred method had literally been working for many months, I have not changed anything as far as I know.

I see where it seemed to someone else it also broke on an older version of Hugo, but I still would like to use the config option that has been documented for awhile, if that option is not going away.

Yes, the default behaviour is changed:

You need to manually set keepWhitespace to false, to trim all the whitespaces.

minify:
  tdewolff:
    html:
      keepWhitespace: false
3 Likes

Thanks. That gets me a little further along, but it still doesn’t strip every last bit of whitespace.

I don’t understand why anyone felt the need to change that behavior, you can see HTML not minified in development, and if you really are hell bent on un-minifying it in production, there’s, e.g., the Fire Source Viewer addon for Firefox.

See

In this example from my own site (see screenshot), I understand wanting to preserve, say, the spaces in </a> | <a> (in both the markup and the rendered page), but not the line breaks before | <a> in the markup. The line breaks were only for my own pre-minify benefit in a text editor and my git repository.

1 Like

At a guess those spaces a preserved because (sadly) they affect the resulting layout (yes, really; HTML is oddly spec’d).

EDIT: That is spaces between tags can affect layout.

2 Likes

You need to understand how go-html-templates work. Everything you write in your templates is printed out “as is”. If you want to remove “whitespaces” you need to use the proper template tags.

You might have the following template:

{{ range something }}
  {{ .title }}
{{ end }}

and this will result in those new lines. But it could be different:

{{ range something -}}
  {{- .title -}}
{{- end }}

see the minus signs? Those mean “remove every whitespace next to this marker” (marker being {{ or }} signs).

Re-check your template and if you run into issues post the template here. But adding some -es every here and there will solve your issue.

2 Likes

Thanks. I gave in and converted the nav menu to <li> items to make one issue go away. But to give an example of something that, while not crucial, confuses me nonetheless, here is my sample template:

<!DOCTYPE html>
<html lang="{{ .Site.LanguageCode }}">

    {{/*
    <head>
        {{- partial "head.html" . -}}
    </head>
    */}}

    <body>

<!-- temp -->
<p>
   item 1
 | item 2
 | item 3
 | item 4
 | item 5
</p>

Now here is my “staging” environment config.toml:

disableLiveReload = true

[minify]
    minifyOutput = true

[minify.tdewolff.html]
  keepWhitespace = false

Here is a screenshot of the output with hugo server:
hugo-minify-keeps-some-random-newlines-04

The initial newline after the opening <p> is discarded, but the subsequent newlines before the closing </p> are not. The paragraph element isn’t inside any curly braces so the minus signs don’t seem to affect this particular case, and the results are the same no matter how much the paragraph is indented in the template.

If the line breaks were not preserved the browser would display this:

image

instead of this:

image

You might find this useful:
https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Whitespace

2 Likes

It’s just too bad the minifier does not mimic the browser rendering engine (or perhaps does not do it consistently). From the article you linked:

  1. Next, line breaks are converted to spaces

So if the minifier cannot figure out to also discard newline 2, newline 3 and so forth within a <p>, then instead it could mimic a browser rendering engine and first convert them to spaces. Then, after the conversion (specific to appropriate HTML elements, such as <p>), it could continue on and discard the extra spaces between intra-paragraph “words” (including the pipes/vertical bars or other standalone alphanumeric/punctuation characters), like it seems to do normally now (and just like the browser rendering engine also seems to do after a page has or has not been preprocessed by a minifier).

I would understand the newline not being discarded or converted inside a <pre> or <code> element, but this was a <p> element. Moreover, it arbitrarily discards one (or even more) intial line breaks between <p> and “item 1”, but then it stops that behavior arbitrarily for newlines following the first “word(s)”.

When trying the following (which contains a mix of leading/trailing spaces before the newlines) …

<p>


item 1
| item 2
 | item 3
 |  item 4



 | item 5 
            | item 6  
         | item 7   

 | item 8         |

     item 9

</p>

… the browser rendering engine gives me the same (correct) rendered output with hugo server or hugo server -e staging, while the minifier preserves secondary newlines in the source after minification (but, interestingly, it does discard the extra secondary newlines between “item 4” and “item 5”). So the newlines in the paragraph element seem to only give the minifier hiccups, not the browser rendering engine; or is the minifier just hard-coded to only discard newlines before the first word inside a <p>? It seems if the mission was consistent newline-preservation, the newline(s) before the first “word” would also be kept.

Between item 1 and the pipe, there must be one whitespace character. Whether the character is an actual space or a newline doesn’t matter. I suspect if you examine the source carefully you will find that there are no spaces after the 1. If there are, you can file a bug with tdewolf/minify.

This issue @ddg raises is between the <p> and ‘item 1’.

There is no space in that screenshot. It’s <p>item 1.

Thank you all for your contributions to this topic, but it is time to close it.

  1. The original question was answered by @pamubay. The default behavior was changed here. The upstream issue has since been resolved, so if you feel strongly about the default behavior, please raise an issue here.

  2. The minifier appears to do exactly what it is supposed to do:

    • strip unnecessary whitespace and otherwise collapse it to one space (or newline if it originally contained a newline)

    If you feel strongly that consuming a byte with a space is better than consuming a byte with a newline, raise an issue here.

4 Likes