I’m not certain where the boundaries of this challenge lie (Hugo feature? Theme feature? Custom development?), so this is more of a request for help and pointers on the topic as my searching for prior examples hasn’t delivered anything suitable for me yet.
I’ve got a couple of “large” Hugo sites for internal to the workplace access. Both contain either personal or sensitive data, and thus are not internet accessible. By large - one build is currently ~60k pages, and the other (in development - because it is very hard to build!) is about 1.5million pages. They’re both based on Hugo + Docsy (a setup we have running well on our Gitlab Pages system).
Free text searching of all content is a must have feature for these sites, so fixing search is a bit of a priority for me presently.
My initial concern is that for the 60k site, the
offline-search-index.nnnnn.json file is ~850MB at present, and that’s causing slow browser page load times - and therefore that a dedicated, not-in-browser-memory-JS-based search platform would help.
My second goal is that we have multiple Hugo+Docsy builds on a range of projects on the Gitlab system, and I’d like an option to search across them all - so a single search index.
We have an Elastic Search cluster setup for another project internally, which would seem like an ideal platform to try and utilise for these two goals since we can’t use an off-site search-as-a-service platform such as Algolia.
Therefore I have essentially two requests here:
- Sanity check - am I correct in needing to look at a self-hosted search platform?
- If so, then what is recommended, and are there any guides you can point me at to help please?
I’ll note that I had seen a Bonsai SaaS guide previously (that’s currently 404-ing), and I’ve seen a 5 year old NPM module for Hugo+ElasticSearch, but that last one doesn’t seem to be complete or relevant.
Thanks in advance!