Hi all.
I’m developing a full-text site search for one of my websites and I need some help formatting my search index. I’m using Hugo’s custom output formats to generate a JSON file which includes the full text of all posts, as described here. This will be pushed to Algolia’s REST batch API, and I’ll then use their free tier to power my search functionality.
Algolia’s free tier requires that each individual record be less than 10kb. Their docs suggest that larger records (such as longer blog posts) be split into smaller chunks then deduplicated using the distinct parameter.
Here’s a simplified version of my index, which currently outputs one record per page, containing the full text of each. I’ve used post URL as my objectID:
{
"requests": [
{{ range $idx, $page := $.Site.Pages }}
{{ if $idx }},{{ end }}
{ "action": "updateObject",
"body": {
"objectID": {{ $page.URL | jsonify }},
"title": {{ $page.Title | jsonify }},
"href": {{ $page.Permalink | jsonify }},
"content": {{ $page.Plain | jsonify }}
}
}
{{ end }}
]
}
I need this template to split longer posts into multiple records. I’d prefer to avoid splitting mid-word, so using substr
on .Plain
probably isn’t workable. I suspect I’ll need to use .PlainWords
, using range
to output multiple records for the longer posts, splitting by word count. I’m struggling to do this, and can’t find any threads with relevant examples.
Here’s an example of the desired output for a longer post. ObjectID needs to be unique, so I’ve simply appended it with a number reflecting its position in the sequence. Order provides an integer for sequencing, and title and URL stay the same. Content should return a certain number of words.
{
"requests": [
{ "action": "updateObject",
"body": {
"objectID": "/example-post-url/_1",
"order": 1,
"title": "My example post title",
"href": "https://www.mydomain.co.uk/blog/example-post-url/",
"content": "Hugo is one of the most popular open-source static site generators. With its amazing speed and flexibility, Hugo makes building websites fun again."
}
},
{ "action": "updateObject",
"body": {
"objectID": "/example-post-url/_2",
"order": 2,
"title": "My example post title",
"href": "https://www.mydomain.co.uk/blog/example-post-url/",
"content": "We love the beautiful simplicity of markdown’s syntax, but there are times when we want more flexibility. Hugo shortcodes allow for both beauty and flexibility."
}
}
]
}
Hope that makes sense. Any pointers would be hugely appreciated.
Thanks!