Audit your published site for problems

Last updated: 2023-11-10T21:41:24-08:00

This information is now included in the Hugo documentation.


Why

There are several conditions that can produce errors in your published site which are not detected during the build. Run this audit before your final build. Yes, I understand that this means building the site twice.

The Audit

HUGO_MINIFY_TDEWOLFF_HTML_KEEPCOMMENTS=true HUGO_ENABLEMISSINGTRANSLATIONPLACEHOLDERS=true hugo && grep -inorE "<\!-- raw HTML omitted -->|ZgotmplZ|\[i18n\]|\(<nil>\)|(&lt;nil&gt;)|hahahugo" public/

Tested with GNU Bash 5.0 / GNU grep 3.4.

Example Output

Explanation

Environment Variables

HUGO_MINIFY_TDEWOLFF_HTML_KEEPCOMMENTS=true
Retain HTML comments even if minification is enabled. This takes precedence over [minify.tdewolff.html.keepComments] in the site configuration. If you minify without keeping HTML comments when performing this audit, you will not be able to detect when raw HTML has been omitted.
HUGO_ENABLEMISSINGTRANSLATIONPLACEHOLDERS=true
Show a placeholder instead of the default value or an empty string if a translation is missing. This takes precedence over enableMissingTranslationPlaceholders in the site configuration.

Grep Options

-i, --ignore-case
Ignore case distinctions in patterns and input data, so that characters that differ only in case match each other.
-n, --line-number
Prefix each line of output with the 1-based line number within its input file.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
-r, --recursive
Read all files under each directory, recursively, following symbolic links only if they are on the command line.
-E, --extended-regexp
Interpret PATTERNS as extended regular expressions.

Patterns

<!-- raw HTML omitted -->
By default, Hugo strips raw HTML from your markdown prior to rendering, and leaves this HTML comment in its place.
ZgotmplZ
ZgotmplZ is a special value that indicates that unsafe content reached a CSS or URL context at runtime. For more information see https://pkg.go.dev/html/template.
[i18n]
This is the placeholder produced instead of the default value or an empty string if a translation is missing.
(<nil>)
This string will appear in the rendered HTML when passing a nil value to the printf function.
(&lt;nil&gt;)
Same as above when the value returned from the printf function has not been passed through safeHTML.
HAHAHUGO
Under conditions too complex to explain in this article, a rendered shortcode may include all or a portion of the string HAHAHUGOSHORTCODE in either uppercase or lowercase. This is difficult to detect in all circumstances because, depending on a variety of factors, the rendered shortcode may include H, HA, HAH, HAHA, HAHAH, HAHAHU, HAHAHUG, HAHAHUGO, etc. A case-insensitive search of the output for HAHAHUGO is likely to catch the majority of cases without producing false positives.

Comments

The problems that this audit detects are surprisingly common in public sites created with Hugo. While you would have to view source in your browser to detect <!-- raw HTML omitted ‐‐> or ZgotmplZ, you can easily find instances of HAHAHUGOSHORTCODE with a simple search.

24 Likes

Great tip !!

Is this audit could be a (to be done) hugo build option ?

Sounds like an Easter egg. [guess someone is having fun behind the scenes, good for them…]

However unlike the other error strings, I’ve never seen HAHAHUGO despite debugging Hugo projects for countless hours in the past.

Perhaps because I prefer keeping shortcodes to a minimum.

Anyway thanks for the tip.

From live sites…


image


image


image

And many, many more.

When was this added?

I am very surprised not having seen this before.

EDIT
Never mind. It was added 3 years ago

I’ve updated the audit to include a search for (<nil>) and its non-safe equivalent.

{{ printf "%s" site.Params.does_not_exist }}            --> %!s(&lt;nil&gt;)
{{ printf "%s" site.Params.does_not_exist | safeHTML }} --> %!s(<nil>)
2 Likes

FYI there is one gotcha with this. If you try and document this audit command in your content naively, then the audit will fail because the documentation will match the grep.

I don’t have a perfect solution, but for now I’m using this: (not really happy with it though):

Might need set -o pipefail to be set as well.

if test "$(grep -iIvrnE 'grep(.+(-- raw HTML omitted --|ZgotmplZ|hahahugo|\\\[i18n\\\])+)' "${OUTPUT_DIRECTORY}" | grep -iIoE '<\!-- raw HTML omitted -->|ZgotmplZ|hahahugo|\[i18n\]')" != ""; then echo "not ok"; exit 1; else echo "ok"; exit 0; fi

It may (or may not) be of interest to some that I have create GitHub Action that builds a Hugo site and runs this test.

Thanks for this writeup @jmooring!

I just added this to my Netlify’s bash build script:

set -euo pipefail # http://redsymbol.net/articles/unofficial-bash-strict-mode
IFS=$'\n\t'

# ..

site_audit () {
    audit_dir="./audit/"

    # https://discourse.gohugo.io/t/audit-your-published-site-for-problems/35184
    if ! HUGO_MINIFY_TDEWOLFF_HTML_KEEPCOMMENTS=true \
         HUGO_ENABLEMISSINGTRANSLATIONPLACEHOLDERS=true \
         hugo --destination "${audit_dir}"
    then
        echo "FAIL: hugo run failed."
        exit 1
    fi

    if grep -inorE "<\!-- raw HTML omitted -->|ZgotmplZ|\[i18n\]|\(<nil>\)|(&lt;nil&gt;)|hahahugo" "${audit_dir}"
    then
        echo "FAIL: Site audit. Review the contents of ${audit_dir}."
        exit 1
    else
        echo "PASS: Site audit"
        rm -rf "${audit_dir}"
    fi
    echo ""
}

# ..

site_audit

# ..
1 Like

@jmooring Does Hugo on Windows support environment variables or would these settings have to be a separate configuration? I’d love to get this working locally before I push it to my build pipeline. Not sure what the “Windows equivalent commands” would be.

Yes. Something like:

set HUGO_FOO=wibble
set HUGO_BAR=wobble
hugo
1 Like

Thanks, I’ll take this for a spin.

I love this topic! I get to learn more about Hugo and improve my own sites in the process. I develop on Windows and although there are tools for emulating Linux, I prefer PowerShell as I use it daily for my “real” job. Here is how I translated this process. Feedback is always welcome.

Although you can use environment variables in PowerShell I found it easier to create an “Audit” configuration:

enableMissingTranslationPlaceholders = true
[minify]
  [minify.tdewolff]
    [minify.tdewolff.html]
      keepComments = true

Then define the regex pattern (I added \\u0026rsquo; because that has been an issue for my migrated sites):

$regexPattern = '<\!-- raw HTML omitted -->|ZgotmplZ|\[i18n\]|\(<nil>\)|(&lt;nil&gt;)|hahahugo|\\u0026rsquo;'

Build Hugo with the Audit configuration:

hugo -e audit

Then get the errors using Select-String:

Get-ChildItem -Path .\public -Recurse | Select-String -Pattern $regexPattern -CaseSensitive:$false | Select Filename, LineNumber, Line, Path | Format-Table

image

You can run Select-String with the -List parameter and only return the first instance of the error in the file:

Get-ChildItem -Path .\public -Recurse | Select-String -Pattern $regexPattern -CaseSensitive:$false -List | Select Filename, LineNumber, Line, Path | Format-Table

image

Finally, in the event you want to simply pass or fail you can assign the match results to a variable:

$lines = Get-ChildItem -Path .\public -Recurse | Select-String -Pattern $regexPattern -CaseSensitive:$false

If $lines.Count -eq 0 you don’t have any errors.

1 Like

I saved this as a bash script and run it with Git and it helped me solve a few issues. Sweet!