Use Hugo `.GetRemote` to update Algolia/Meili indice during build

I’m currently experimenting (successfully so far) entrusting the update pipeline of Meili search index to Hugo’s resources.GetRemote feature.

To ensure the endpoint cache is only busted if documents are updated, I jsonify and sha256 the document list and append the hash to the endpoint with a # as Meili API restricts query parameters.

This means I can rely on Hugo’s own endpoint caching.

Now this seems all good and great and removes the need for serverless functions or external chronic jobs to perform the index change upon content update by editors. It also removes the need to publish a JSON representation of the search index on the site which is usually consumed by the serverless function.

I’ve bundled this logic as an experimental feature of our Hugo Search Module

The core code is here: hugo-module-tnd-search/update-index.html at v0.2.5 · theNewDynamic/hugo-module-tnd-search · GitHub

But I guess my question is: Does this sound like a good idea? Does anyone think this could cause problems in the long term which I fail to foresee?

Knowing that:

  1. A failed API call will not break the build
  2. API call only happens once per build and only if documents are updated
  3. This does not delete documents, only add/update.
1 Like

Does this sound like a good idea?

It doesn’t sound like a bad idea. The parts that you probably need to somehow attack are:

  • add an API call to clear the index before submitting the items (it’s working in Algolia this way for me)
  • add a way to NOT always update the index on build automatically (the free feature has limits as to how many items you can create per month, Algolia again. With 1000 posts that would be 30 index actions with the free package.
  • find a way to update instead of re-index. I read a bit back in the issues that a feature is thought of to only create changed content on Hugo run. maybe that helps. Maybe having some form of a cached list of items and then comparing that to the newly created list would work.

Going JS or Go API way might be a better way than to use Hugos build process for this. I think the limits are the limits :wink: Having it automatically updating the index is nice. But it’s making that process also forgettable for the casual user with a free package who might run into the limits.

I’m lazy, but

  • Do you push the search data as a JSON post?
  • How do you host Meili? Is it any good?

yep

  • How do you host Meili? Is it any good?

It is good, updating their index is a very fast operation (tried with 4000 entries of modest weight). They have dozens of SDKs! (if you don’t want to use GetRemote :). They have their instantsearch fork which means you can use instantsearch the same way you would with Algolia.

As for hosting you can set it up yourself on DigitalOcean or else, they have some mininal setup tutorials etc… But they also have a beta cloud solution on which you can spin a Meili host with a simple click. I’ve used it for most of my trials. It’s on waiting list, but I’m confident they would bump you at the top @bep .

There is a major pain point (for some): They don’t have a dashboard from where you can manage apps, indices, settings etc… Everything is set through their API or whatever SDK you’re comfortable with.

1 Like

Thanks for your feedback @davidsneighbour !

I’m unfortunately familiar with Algolia’s limitation and I’ve struggled with them without .GetRemote. I think in the context of this proposed solution we could keep publishing an index (you’re idea of a cached list), and have Hugo fetch it, compare it with what it’s about to post and only include the difference… It’s a bit far stretched, but to my knowledge, non of Algolia’s SDKs let you efficiently update your index to stay away from their limitation. So I think it’s to each his own on that front… GetRemote or otherwise.

add a way to NOT always update the index on build automatically

That’s what I’m achieving with including a hash (built from the documents) on the endpoint to ensure I can rely on Hugo’s getremote cache. The endpoint will only be updated if the documents change, the rest of the time, Hugo will uses its cached response and not solicit Meili/Algolia

  • add an API call to clear the index before submitting the items (it’s working in Algolia this way for me)

Good idea! No worries about entries not being deleted! I would only be worried about the subsequent call to fail and be left with an empty index. But I guess breaking the build would ensure we’re notified and can take action.

But it’s making that process also forgettable for the casual user with a free package who might run into the limits.

We can make sure they don’t forget with warning and errors.

Well it seems most our concerns is for Algolia’s limitations… That’s why I started with Meili :smiley: If you don’t use their cloud service, the limitations are lifted. You inherit another responsibility though, but that’s for another discussion.

I only today realised that Meili is some form of self-hosted Algolia. Up to now, I thought it was “Chinese Algolia” for some reason… I’ll experiment with it.

1 Like

Up to now, I thought it was “Chinese Algolia”

Actually it’s the french :fr: algolia. Self hosted. (baguette under your arm)

2 Likes

How much I wish to have a french hat icon or utf8-symbol now.

I’ve applied, but I doubt they recognise my name … But I’m mostly interested in a self-hosted setup. Algolia is great and all, but it gets expensive very fast.

I have this thing with this kind of service where I think asking for an enterprise package for a popular open-source project would help get more support. Maybe someone (me?) just write an email to them and see what happens?

Netlify, Algolia, they both would profit from supporting Hugo. Just saying.

Sure, but as I said, I’m mostly interested in their self-hosted solution, which I understand is not a “waitinglist thing”.