\x26rsquo; bad escape sequence in string for JSON-LD

This is a continuation of a previous topic, from two years ago:

Long story short, when using json-ld structured data, strings with apostrophes turn into \x26rsquo;.

The previous thread mentioned using | safeJS for an old version of Hugo. I’m using 0.55.6 and it didn’t work. What did work was this sequence of functions: | safeJS | htmlUnescape | plainify.

However, if the string contains a url with the + sign, it also returns invalid data for Json-ld. For example: https:\/\/plus.google.com\/u\/0\/\x2b

1 Like

I have the same issue with JSON-LD meta data in my Zen theme.

I include this partial with {{ partial "single_json_ld.html" . }}.

I also get things like \x26ldquo;. Using safeJS does not fix the issue.

I might have found the issue.

Removing quotes around variables, so {{ .Title }} and not "{{ .Title }}" seems to improve things. I get valid json now.

In the output Hugo will add quotes around the variable values.

This have definitely changed in a recent version of Hugo.

3 Likes

If your Title is something like Bla & Fasel then yes, it is valid JSON without quotes. Because you just transferred it’s meaning into IF variable Bla and variable Fasel is set which at Google resolves to NOPE because those variables don’t exist in the context of your JSON :wink:

As long as it’s a string you will require quotes.

I did not read through the original post, but in my opinion it’s something about urlencode. I am relatively sure Hugo has something for that. That will encode only &# and some special characters I think.

I just tried to remove the quotes in the articleBody part of the json, and it works,

Here is the final code if anyone wants to use it

			<script type="application/ld+json">
			{ "@context": "http://schema.org",
			{{ if .IsHome }}
			"@type": "WebSite",
			{{ else }}
			"@type": "Article",
			{{ end }}

			"mainEntityOfPage": {
				"@type": "WebPage",
				"@id": {{ .Permalink }}
				},
			"headline": "{{ .Title }}",
			{{ with .Params.subtitle }}"alternativeHeadline": {{ . }},{{ end }}
			"image": {{- if .IsHome -}}
						{{ $siteLogo}}
						{{- else -}}
							{{- with $headerImage -}}
							{{ .Permalink }}
						{{- end -}}
						{{- end -}},
			"author": {
			   "@type": "Person",
			   "name": {{ $.Site.Params.author }}
			},
			{{ with .Params.tags }}"keywords": "{{ range first 6 . }}{{ . }}, {{ end }}",{{ end }}
			"wordcount": {{- .WordCount -}},
			"publisher": {
			 "@type": "Organization",
			 "name": {{ $.Site.Params.author }},
			 "logo": {
			   "@type": "ImageObject",
			   "url": {{ $siteLogo}},
			   "height": 60
			 }
			},
			"url": {{ .Site.BaseURL }},
			"datePublished": {{ if not .Date.IsZero }}{{ .Date.Format "2006-01-02T15:04:05" | plainify }}{{ end }},
			"dateCreated": {{ if not .Date.IsZero }}{{ .Date.Format "2006-01-02T15:04:05" | plainify }}{{ end }},
			"dateModified": {{ if not .Date.IsZero }}{{ .Date.Format "2006-01-02T15:04:05" | plainify }}{{ end }},
			"description": {{ if .IsHome }}{{ $.Site.Params.description }}{{else}}{{ with .Params.description }}{{ . | plainify }}{{ else }}{{ .Summary | plainify }}{{ end }}{{ end }},
			"articleBody": {{ .Plain }}
			}
			</script>

Can you point me to an URL online where you are using this shortcode? I can’t believe it works for the reasons laid out before. I assume whatever validation tool you are using is dropping all after dateModified.

Please test your code with content that contains : (meaning a new parameter is introduced) or , (meaning next parameter will be introduced).

The code is live here: https://brunoamaral.eu/post/ramadan-in-lisbon/

(Any of the posts and pages will show the structured data at the end)

And I am using the structured data testing tool here : https://search.google.com/structured-data/testing-tool

That website has quotes around all entries. Are you sure the template above is coming into play? If so I wonder where the quotes come from. If that is something Hugo does then good.

In general: according to the standard only strings WITHOUT whitespace or special characters are allowed to be without quotes. In their samples they even quote the dates:

https://www.w3.org/TR/json-ld/

Correct, it’s Hugo’s doing. For example, I just have "articleBody": {{ .Plain }} and it wraps it in quotes.

Well then it’s clear that if there were quotes around those quoted content they cancelled themself out like ""content"". Weird. I wonder if that is a bug.

I don’t want to be a pest, but here: https://brunoamaral.eu/story/museum-tour/
If you check this article on the Structured Data Tool it drops errors, because of the ' in the title.

As I was saying: Try using htmlescape (which is the function I guess will mask all these special characters fully). Maybe in connection with safejs it will go well.

After that I still think there should be quotes around all strings. Some day somewhere this will be fixed and your templates stop again working.

What I am trying to say is that those special characters need to be marked by &-entities, not \-entities.

1 Like

Thanks for noticing, I have fixed it now. And I am keeping a comment in the source code with the link for this thread in case i come into problems in the future. :slight_smile:

I am also running into this issue.

| safeJS | htmlUnescape | plainify fixes some of my content, but doesn’t handle ampersands correctly. I also tried htmlescape, but that doesn’t resolve the issue. It does seem like there is a bug in terms of how these strings are being processed. For now, I’m working around it by rewriting quotes and HTML special chars out of my content, since this is just impacting the page description for me currently.