Escaping Meta Tag Content? Help

#4

That’s pretty weird this is what I use for my description meta tags try doing this and letting it use your page content as your description to see if it still does the weird apostrophe thing.

<meta name="description" content="{{if isset .Params "description"}} {{.Params.description}} {{else}} {{.Page.Content | safeHTML | truncate 300}} {{end}}">
#5

Hmm, I just tried it that way (using .Params.description like you are instead of .Page.Description), but I have the same bad behavior. Hrmm…

What version of Hugo are you using? I’m on Hugo Static Site Generator v0.40.1 linux/amd64 BuildDate: 2018-04-25T17:16:11Z

#6

I’m using hugo 0.48, that might be the problem you have an older version of hugo.

#7

Huh. Thanks for bringing this up @rdegges. Just started a new Hugo project yesterday and now realize that I’m having the same issue in <meta> tags. I’m using v0.49-DEV/extended on OSX, and both safeHTML and safeHTMLAttr don’t seem to help when I build the site.

#8

I’m sure you’ve already checked this, but I think I never noticed because I’m always looking at the meta tags inside Chrome’s dev tools, which automatically converts #39; to an apostrophe when I’m doing local dev, so I didn’t notice. However, on build, I’m definitely seeing the same issue…

#9

Glad I’m not the only one! I just upgraded to 0.49 on linux (64 bit), and I’m still seeing the exact same behavior. I’ve tried just about everything but literally cannot find a way to get it to render without escaping the apostrophes.

I’m at a total loss here. Really confusing.

Anyone else have an idea?

#10

Can you post your config.toml? I have this on mine:

pluralizelisttitles = false

[blackfriday]
   extensions = ["hardLineBreak"]
   nofollowLinks = true
   noreferrerLinks = true
   hrefTargetBlank = true

Although I don’t think any of this matters just spitballing ideas here.

#11

Sure, here is my config:

name = "blah"
email = "blah@test.com"
baseURL = "https://blah.test.com"
googleAnalytics = "UA-15777010-4"
languageCode = "en-us"
title = "My Site"
theme = "mytheme"
pygmentsCodeFences = true
pygmentsUseClasses = true
pygmentsCodefencesGuessSyntax = true

disqusshortname = "myblog"
paginate = 10
paginatePath = "page"

[permalinks]
  blog = "/blog/:year/:month/:day/:filename/"

[outputFormats]
[outputFormats.RSS]
mediatype = "application/rss"
baseName = "feed"
#12

This issue looks similar to this: https://github.com/gohugoio/hugo/issues/5236

That could be “solved” by 1) Switching to safeHTMLAttr which is the right filter to use but also another PR to Go like the one I did here: https://github.com/golang/go/pull/27805

The problem for this one is that quotes can have an extra meaning since some people use single quotes for attribute values.


To summarize the above, the Go templating system is removing it because it’s suppose to. A new PR would be difficult (for me at least) because right now Go has a blacklist of characters allowed within quoted attributes. It makes no distinction between single and double quotes.

Meaning if I made a PR to allow a single quote, the way I did for the “+” character. it would also be allowed in a situation like this:

<meta name="description" content='{{ .Page.Description }}'>
``

In which case the single quote from your page's description would break the HTML.
#13

Hmm, I just read through your PR and the thread of convos on GitHub. It’s an interesting problem. It seems like sort of an edge case.

The blog I’m building is for my company which has quite a few readers (we’re migrating from jekyll). Unfortunately, I don’t know if Google bot (or any of the other search engine bots, for that matter), will properly render the page title/description for search results if an apostrophe is escaped like that.

Also, you mentioned that you can “solve” this problem by using safeHTMLAttr – can you show me how you do that? I tried safeHTMLAttr by doing: {{ printf "%q" .Page.Description | safeHTMLAttr }} but that didn’t change the results for me. Maybe I’m using it wrong? I also tried %s, for what it’s worth.

#14

Only with the upstream changes to Go. :confused:

Do you use Google Search Console? If you do, you can see what information they’ll pulling so maybe you can see if their indexing things right that way? I’m not, this is something that needs to be solved.

#15

Ah, that’s a great suggestion! I’ll give the Google search console a test and report back tomorrow! =D Glad to find out I’m not crazy though.

#16

Does it still do it if it reads text from the content instead of the description? Maybe my server has an older version of go or something because I can’t reproduce this issue :confused:

#17

It happens for me when text is read from .Params or something similar. I’ve tried with both Hugo v0.47 and v0.49, on Ubuntu 18.04.

1 Like
#18

I’ve seen this issue before and it’s related to the go html/template package:

What I found is that If you minify your site with hugo --minify the attributes are unescaped.

#19

Interesting might be why I can’t reproduce, gonna have to try minifying to see if I can reproduce.

#20

Hi guys!

I have the same issue rendering through a shortcode, i.e.

{{< form-contact action="http://formspree.io/youremail+site@example.com" >}}

escapes the ‘+’ character. I tried all the safe options already.

Running hugo 0.49 on Ubuntu 18.04.

#21

This sounds awesome, unfortunately hugo --minify panics for me :frowning: https://github.com/gohugoio/hugo/issues/5261

#22

@rdegges Please read Requesting Help and create a new post in #support. Your issue should be discussed separately, and before opening a ticket in the issue queue. Please include the information mentioned in Requesting Help so other may assist you.

#23

Reviving for the sake of documenting a solution.

In order to disable escaping of HTML entities in a tag, use safeHTMLAttr, as previously suggested. However, the catch is that safeHTMLAttr should be used to mark the complete attribute (name + value), not just the value.

Original:

No escaping:

<meta name="description" {{ .Page.Description | printf "content=%q" | safeHTMLAttr }}>

Bear in mind that whenever you use the safe* functions (safeHTML, safeHTMLAttr, etc), it means you fully trust the input.

A malicious input could infect your website. Given a content file like this:

---
description: "Untrusted page's description. \"><script>alert('hello')</script>"
---

And a template:

<meta name="description" content="{{ .Description }}">
<meta name="description" {{ .Description | printf "content=%q" | safeHTMLAttr }}>

The output is:

<meta name="description" content="Untrusted page&#39;s description. &#34;&gt;&lt;script&gt;alert(&#39;hello&#39;)&lt;/script&gt;">
<meta name="description" content="Untrusted page's description. \"><script>alert('hello')</script>">

Notice that the original form escapes the script tag, and the form with safeHTMLAttr does not, thus outputting JavaScript that is executed on page load.

2 Likes