My blog has become a personal knowledge base of sorts. As it’s grown, doing the usual
Command
+F
for browser-search doesn’t always cut it. I sometimes need to search for keywords within a blog post, yet only titles are shown on the blog list page. So, that’s my problem, but how to fix it?Solutions do exist, but they entail connecting to an external search service, or pulling in a large JS framework. I wanted to keep things as close to home as possible and use good ole vanilla JS.
After much consideration, I added a simple search bar. In a nutshell, it uses a JSON index to search against, then re-renders the blog list on each
keyup
event.I’m documenting the relevant bits in case it’s useful for someone else. Let’s dive in.
{ Little devil on your left shoulder appears }
This approach has one big glaring issue (and sorry it hits you, but I decided to note that on the next post that creates a single JSON file for a search index to get this issue some regocnition, hehe): Your (@zwbetz) resulting JSON file is already about 1MB large. The more you write, the larger the file gets. Even with caching and versioning your visitor will have to load that file at least once. You could preload it or prefetch it, sure, but it will impact the amount of data required to load on your website and I am pretty sure that will have an impact on how the big G sees your website.
With my >2k blog posts this approach ended me up with about 8MB of data to be loaded for the search. So for larger sites, it is not really working well, slow and probably not too good for ranking (depending on how the JSON is loaded).
I have a faint idea how to solve this, but currently, Hugo does not allow this (IMHO) and I am waiting for whatever the fabled “pages from data” feature may bring.
The idea is this:
Creating index files for smaller groups of content with up to $count items per file in an indexed deep folder structure:
|-a/a/a/index.json
|-a/a/a/a/index.json
|-a/a/a/b/index.json
|-...
|-a/a/b/a/index.json
|-a/a/b/b/index.json
|-...
|-z/z/z/z/index.json
This index might have to be created multiple (ie. many) levels deep and contain the results for aaa
and aaaa
search results and so on. Starting after three characters to keep the number of files down.
Sure this will lead to many files, but it’s probably how larger file-based indexes work anyway. The search script would then instead of search/?term=something
look for search/s/o/m/e/t/h/i/n/g/index.json
(or route it somehow nicely to keep the script on the frontend easy).
Hugo will probably stop being the fastest website creator with an index-generation process like this… It would be interesting to see, if there is a Go component that is already able to create a nice readable index of content somehow.
Other than that, for smaller sites, these searches are way better than setting up a large paid service even if they have a free model and having to muddle with the build scripts.
I have no real solution to this conundrum, so I bow and disappear now.
{ Little devil on your left shoulder disappears }
This is an excellent script—slim, modern and with a vanilla taste. Thank you for making this available.
Would it be possible to implement an AND
logic for multiple search terms?
Your script would be a great addition to the search tools.
Interesting concept!
I, too, was thinking about adding a search functionality to a large-scale Hugo site.
I will probably add /layouts/index.sql.sql
, import this (manually?) into a database and use a /static/search.rb
or .php
script with this in config.yml
:
mediaTypes:
application/x-php:
suffixes:
- php
outputFormats:
PHP:
mediaType: application/x-php
isPlainText: true
baseName: index
outputs:
home:
- PHP
Another approach:
Edit: In this approach, the index file created can be ridiculously smaller, since it doesn’t contain duplicates and stop words are filtered out.
Bonus if you also choose to filter content related to code fences, KaTeX/MathJax and etc.
@davidsneighbour - I don’t mind the shoulder devil as long as he drives progress. Allow me some time to think on this. In the meantime, I made the JSON minify configurable.
@Grob - Thanks. For your AND
/ OR
question, currently the match is done by the String.prototype.includes()
function. That could potentially be changed to use regex instead.
@Grob - After some experimenting, I added regex mode. It’s cool.
You can OR
things by searching dog|insect
You can AND
things by searching .*dog.*insect.*
@davidsneighbour - By minifying the JSON index, and removing some duplicate code (I was calling .Plain
twice for my personal site), I decreased the file size from 1 MB to around 500 KB.
Also, the fetch
API uses caching by default.
- If there is a match and it is fresh, it will be returned from the cache.
I don’t have an answer to your other question. And I’m willing to take the small performance hit as my blog grows.
You can always add a prefetch tag to your site, so the browser can pre-fetch the search index in it’s idle times. Then cache that file. This should get rid of the waiting time if the browser does the prefetch.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Link_prefetching_FAQ
FWIW, I wrote a small tool to send a JSON index from Hugo to Algolia on build and then a small JS function to display the search results with Alpine JS. You can see the results here and the pieces at GitHub - spotlightpa/algolia-indexer: Send a JSON search index to Algolia and sourcesdb/search-people.js at main · spotlightpa/sourcesdb · GitHub
Just to pile on, I added Azure Cognitive Search to my Hugo sites. It’s very quick, cheap (like nearly free) & wrote about it here (you can see it working from the search control on the site). It redirects you to the search results page with the search term on the query string. The page makes an XHR call to Azure Search’s REST API to get the results and renders them client-side with Handlebars.
Basically, configure Hugo to generate all content in a huge JSON file. I then created an instance of Azure Cognitive Search and configured it as explained in the blog post. Then, before each deployment, I send a request to Azure Cog Search to reindex the site… this is done using GitHub Actions:
@sephore / @carlmjohnson / @andrewconnell - Thx for the alternative search implementation links. Glad to see all the ways people skin this cat.
Although I still prefer not using an external service
Yes, I understand why you might not want to rely on a third party like Algolia. Still, it’s a good choice for a lot of people.
I’ve added an “Enable search” checkbox. That way users can decide if they want to fetch the JSON index.
This is cool, @zwbetz. Thank you for providing this.
I played with your script again and have two suggestions. But feel free to ignore them
- Would’t it make sense to implement an
AND
logic as default—ignoring of the order of terms in the JSON file? (The current regex makes a difference between.*Affenpinscher.*Aidi.*
and.*Aidi.*Affenpinscher.*
.) Regex addresses technical people only. - Would it be possible to manipulate the URL so that one can link to a search result, e.g.
https://build-a-search-bar-for-your-hugo-blog.netlify.app/blog/?q=Affenpinscher Aidi
?
@Grob - Interesting ideas. I’ll leave those as an exercise for the reader