[SOLVED] How to create a JSON index when there's code in content?

Continuing the discussion from Possible to get markdown escaped as just plain text if not JSON?:

I’ve searched the forum and Github, but I cannot make a JSON valid search index in Hugo. Each time there’s code inside the content (between ~~~ and ~~~), JSON fails.

I’ve had a lot of help with the information on the forum – my code is based on @rdwatters contributions – but cannot make it work. I really hope someone can help, because with this my site cannot have a search.

Starting point

My Hugo code is:

[
{{ range $index, $page := where .Site.Pages "Section" "!=" "" }}
{{ if and $index (gt $index 0) }},{{ end }}
    {{ if (ne $page.Section "") }}
    {
        "objectID": "{{ $page.UniqueID }}",
        "url":      "{{ $page.RelPermalink }}",
        "title":    "{{ $page.Title }}",
        "platform": "{{ index $page.Params.categories 0 }}",
        "content":  "{{ $page.Plain }}"
    }
    {{ end }}
{{ end }}
]

With the JSON lint validator, this gives the unexpected EOF error:

As we can see here, the "content" JSON element breaks whenever there’s code come across.

Using jsonify to encode the content to JSON

Hugo code:

[
{{ range $index, $page := where .Site.Pages "Section" "!=" "" }}
{{ if and $index (gt $index 0) }},{{ end }}
    {{ if (ne $page.Section "") }}
    {
        "objectID": "{{ $page.UniqueID }}",
        "url":      "{{ $page.RelPermalink }}",
        "title":    "{{ $page.Title }}",
        "platform": "{{ index $page.Params.categories 0 }}",
        "content":  "{{ $page.Content | jsonify }}"
    }
    {{ end }}
{{ end }}
]

Error:

JSONfying plain words

Hugo code:

[
{{ range $index, $page := where .Site.Pages "Section" "!=" "" }}
{{ if and $index (gt $index 0) }},{{ end }}
    {{ if (ne $page.Section "") }}
    {
        "objectID": "{{ $page.UniqueID }}",
        "url":      "{{ $page.RelPermalink }}",
        "title":    "{{ $page.Title }}",
        "platform": "{{ index $page.Params.categories 0 }}",
        "content":  "{{ $page.PlainWords | jsonify }}"
    }
    {{ end }}
{{ end }}
]

Error:

Just PlainWords

When using PlainWords as used in this topic:

[
{{ range $index, $page := where .Site.Pages "Section" "!=" "" }}
{{ if and $index (gt $index 0) }},{{ end }}
    {{ if (ne $page.Section "") }}
    {
        "objectID": "{{ $page.UniqueID }}",
        "url":      "{{ $page.RelPermalink }}",
        "title":    "{{ $page.Title }}",
        "platform": "{{ index $page.Params.categories 0 }}",
        "content":  "{{ $page.PlainWords }}"
    }
    {{ end }}
{{ end }}
]

Error:

(Again JSON breaks whenever there’s code come across.)

RawContent with jsonify

Hugo code:

[
{{ range $index, $page := where .Site.Pages "Section" "!=" "" }}
{{ if and $index (gt $index 0) }},{{ end }}
    {{ if (ne $page.Section "") }}
    {
        "objectID": "{{ $page.UniqueID }}",
        "url":      "{{ $page.RelPermalink }}",
        "title":    "{{ $page.Title }}",
        "platform": "{{ index $page.Params.categories 0 }}",
        "content":  "{{ $page.RawContent | jsonify }}"
    }
    {{ end }}
{{ end }}
]

Error:


I’m using:

PS I:\site> hugo env
Hugo Static Site Generator v0.18 BuildDate: 2016-12-19T14:42:56+01:00
GOOS="windows"
GOARCH="amd64"
GOVERSION="go1.7.4"

The above JSON troubles also happened with Hugo 0.17 (although I did not test RawContent on that Hugo version).


Thanks in advance for any suggestion or idea!

Go HTML templates are pretty picky about their context. Things to try:

        "content":  {{ $page.RawContent | jsonify }}
        "content":  {{ $page.RawContent | jsonify | safeJS }}
        "content":  "{{ $page.RawContent | jsonify | safeJS }}"

Watch this thread and this issue for a real, long-term solution.

2 Likes

Thanks for replying. Unfortunately no luck with that. The generated JSON file still trips over the double quote usage.


I like to use .PlainWords instead of .RawContent because the latter also includes markdown formatting of links and my shortcodes. .PlainWords is a bit cleaner in that sense.

But the issue with .PlainWords is that it doesn’t translate " to &quote; in the resulting output when that " sign is included in a code block.

So I thought about using a regex like this:

"{{ $page.PlainWords | replaceRE "\"" "$1" "&quote;" }}"

I’ve also tried other regex approaches, but this fails (possible because Hugo already expects a double quote that specifies the regex itself). (Validating the regex pattern against an online validator does show it’s correct and matches with .PlainWords.)

Can someone help me with the regex/Hugo approach for this? I’m out of my skill zone here.

I thought this pull request would be a real solution: Add generator for a search index by digitalcraftsman · Pull Request #1853.

You may get some inspiration/help here:

I think you will get less friction if you first create the object you want to serialize, then use jsonify on that object instead of your manual string manipulations.

This will be better once we get this fixed:

#1853 is too specific, so I don’t think that will get into Hugo master.

Thanks for the suggestion, but the approach that you used in that topic (with .Plain) I’ve tried already. Using jsonify on that object then only make it (the double quotes problem) worse.

I figure from your reply that you have no trouble with getting a JSON file when you use double quotes in a code block on Hugo 0.18?


After more testing and tweaking, I think there’s an issue in .PlainWords/.Plain which explains why .PlainWords with or without jsonify doesn’t work.

On Hugo 0.18, this markdown code:

~~~csharp
Console.WriteLine("Testing 'firstValue' against 'secondValue':");
~~~

Generates with .PlainWords as:

Console.WriteLine("Testing 'firstValue' against 'secondValue':");

This is correct, and how I’d expect .PlainWords to work.

But now create a code block like this:

<pre><code class="language-csharp full">
Console.WriteLine("Testing 'firstValue' against 'secondValue':");
</code></pre>

That generates an output like:

Console.WriteLine("Testing 'firstValue' against 'secondValue':");

In this case, .PlainWords fails to strips the double quotes from the content, and that gives me all the issues I’m having to create a valid JSON output.

So how can I make .PlainWords also take <pre><code> tags into account? Is it supposed to do so? If so, is there an alternative that makes .Content plain with taking HTML tags into consideration? Because this way .PlainWords doesn’t work 100% for people with my use case.

No, that was not my intention. I have no idea what your problem is.

My problem is that my JSON file breaks due to double quotes (") in the page’s content. Those quotes are there because .Plain and .PlainWords do not remove them from inside code highlights marked with <pre><code>. (They do remove quotes from code highlights beginning with ~~~ though.)

Besides .Plain/.PlainWords I don’t know of another way to get a page’s content in valid JSON format.

To solve my issue, I need to get a page’s content in plain text suitable for insertion in a JSON file.

Quotes are perfectly valid as “plain text” in any definition. What you need is likely an escape template func, which we probably could add if we found a suitable func in Go’s stdlib.

Have you tried the htmlEscape template func?

(I see we’re are already on the same page, but I’ll keep the additional explanation here in case it might prove useful later.)


Let’s me explain why .Plain and .PlainWords prevent me from creating a JSON valid file.

Let’s write this blogpost.md:

From the Hugo source:

~~~go
import (
    "fmt"
    "html/template"

    "github.com/spf13/hugo/helpers"
)
~~~

Or with only importing the `fmt` package:

<pre><code class="go-highlight full double-size">
import (
    "fmt"
)
</code>
</pre>

The end of the article.

Now let’s add the content of this article to our search index with .PlainWords:

[From the Hugo source: import ( &quot;fmt&quot; 
&quot;html/template&quot; 
&quot;github.com/spf13/hugo/helpers&quot; ) Or with only 
importing fmt: import ( "fmt" ) The end of the article. Hello world, 
this is my first blog post on my Hugo static website!]

To highlight, those two double quotes from the <pre><code> tags make the whole thing JSON invalid:


The htmlEscape function looks promising from the description in the docs, but I don’t know how to use it. All of the ways mentioned below generate an error in general.go for me.

{{ .PlainWords htmlEscape }}
{{ htmlEscape .PlainWords }}
{{ .PlainWords | htmlEscape }}
{{ htmlEscape (printf .PlainWords "%v") }}
{{ htmlEscape (printf .PlainWords "%s") }}

ERROR: 2016/12/25 18:06:00 general.go:236: Error while rendering page json-search-index.md: template: theme/json/single.
html:11:24: executing "theme/jso.

I agree that custom content types offer much more flexibility. #1853 could be used as a reference for a template that creates a search index. I favor your appraoch @bep, so should I close my pull request?

You might as well.

#1853 has been closed.

Does someone know how to address the issue below? I’m also happy with a suggestion or speculation. :slight_smile:

I succeeded in creating a valid JSON index in Hugo by using {{ htmlEscape .Plain }}. :slightly_smiling:

@bep Do you want me to file a Github issue for the error message that {{ htmlEscape .PlainWords }} generates? If .PlainWords is undocumented because it’s unsupported, then I don’t want to add to the work that the developers already have.

Not sure what error message you refer to, but I fixed the “clipped off error messages” in Hugo 0.18.1. PlainWords is supported, not sure why it’s undocumented.

Would you mind posting your final solution until we get native support for custom output formats?

1 Like

My final solution was the code in the opening post coupled with what I said in the previous post that generates the content in a valid approach.

Or do you mean something else? (Because I did not create a custom output (i.e., a .json file) format you’re asking about.)