How are you implementing site search

Swiftype seems to be a good alternative to Google CSE. Their free plan for individuals should be more than enough for small sites like blogs.

I just released a node package that can read the markdown and html files in the Hugo content directory and generate a json index file for use with Lunr.js. The ideal would be to have Hugo generate an index on the fly but for the time being this works well.

2 Likes

@dgrigg Currently, I’m working on a content index that saves all pages to a JSON file. BTW, I added your node package to the tools section in the docs.

https://github.com/spf13/hugo/pull/1853

1 Like

Great, I saw the conversation about adding JSON output which will be great. Thanks for adding to the tools section.

I’m using Isotope (http://isotope.metafizzy.co). Basically I have a ‘products’ page that shows all our products, then Isotope filters down by whatever has been typed in the search box - extremely easy to implement and for our product set of 500 items, its pretty much instantaneous, although I imagine that for larger data sets it could get slow.

1 Like

Can you or anyone else explain to a noob what you mean by “cp or mv to public…”

Thanks!

Copy or move to the public (destination) folder.

Thanks once again @bep! So, @thraxil basically saved the code from his generated index file to a file named PagesIndex.json (in the directory public/static/js/lunr/)?

How is that done automatically with a deploy script such as @thraxil mentions?

Apropos of nothing really, except this thread keeps floating to the top of my “unread” list here:

I’ve recently switched away from my previously using Google to using DuckDuckGo instead. Although Google seems to generate more hits than DDG, I just got pissed off with the ridiculous amounts of JS and CSS needed to embed Google’s search and configure it.

Embedding DDG search is as easy as sticking a form containing an input box on the page. No JS to embed and most of the configuration can be done via [unfortunately cryptically named] parameters in hidden form elements

Full List of DDG Form Parameters Here

Example:

<form id="searchform" method="get" action="https://duckduckgo.com/">
            <input type="search" name="q" maxlength="255" placeholder="Search..."><!-- placeholder text in search box -->
            <input type="hidden" name="sites" value="stiobhart.net"><!-- your domain here -->
            <input type="hidden" name="kp" value="-1"> <!--safe search on/off -->
            <input type="hidden" name="kh" value="1"> <!--HTTPS on/off -->
            <input type="hidden" name="kl" value="wt-wt"> <!--region wt-wt = no region/worldwide -->
            <input type="hidden" name="kg" value="p"> <!--get [g] vs post [p] -->
            <input type="hidden" name="k7" value="#fffbe6"> <!-- background colour -->
            <input type="hidden" name="kj" value="#cc0000"> <!-- results page header colour -->
            <input type="hidden" name="ky" value="#eee8d5"> <!-- highlight colour -->
            <input type="hidden" name="kx" value="#cc0000"> <!-- URLs colour -->
            <input type="hidden" name="k1" value="-1"> <!-- adverts on/off -->
            <input type="hidden" name="ko" value="-2"> <!-- DDG interface on/off -->
            <input type="hidden" name="k8" value="#515151"> <!-- text colour -->
            <input type="hidden" name="k9" value="#cc0000"> <!-- links colour -->
            <input type="hidden" name="kaa" value="#cc0000"> <!-- visited links colour -->
            <input type="hidden" name="kae" value="#cc0000"> <!-- theme [changes result titles] colour -->
            <input type="hidden" name="ka" value="Roboto"> <!-- link font -->
            <input type="hidden" name="kt" value="Roboto"> <!-- text font -->
            <input type="submit" value="DuckDuckGo Search" style="visibility: hidden;"><!-- hide submit button. Search via hitting RETURN -->
</form>

I’ve got that lot saved in a partial, so I can just drop it into any new site I’m building with a simple {{ partial "searchbox" . }} and then change the colours and fonts as appropriate. It’s certainly a lot less hassle than all the hoops that need to be jumped through to set up Google Search anew, each time.

5 Likes

UPDATE ON PREVIOUS POST: I think I spoke too soon!

I’ve been updating one of my sites and, over three weeks ago, stuck up a kind of temporary version as a placeholder. All this time later, DDG is not finding anything at all. I’ve been to Yandex [who powers DDG’s site search] and they’re still only listing hits from the previous version of the site, which hasn’t been online for a month.

That would be bad enough, but is compounded by the fact I’ve got a Webmaster account with Yandex, where my site[s] are already “registered” for crawling. I’ve re-submitted the site AND uploaded a Sitemap and, over a week later they’ve still not updated any of the results for that domain.

Meanwhile, Google had picked up every single page on the new temporary site less than 24 hours after I uploaded it –and that was without me doing anything to prompt a re-indexing at all.

It looks like, if I want to avoid Google, I’ll have to check out some of the static indexers mentioned above. On this evidence, DuckDuckGo [and by extension] Yandex is completely useless on any site where the content is likely to be changing with any regularity.

Yandex only the first index does for a long time.This is probably spam protection. Once the site is indexed new pages are added instantly! In Yandex has a special interface for reports of new pages. I used Yandex with Drupal, and I can assure you that the pages are added instantly.

I’m guessing from your name, you might be based a lot closer to Russia than me and I’ve no reason to doubt your claims to have had good results with them. But unfortunately, my experience has been the opposite. I can only speculate that Yandex prioritise Russia [and a couple of other nearby countries] when updating their various data points.

Unfortunately this isn’t the first time I’ve been let down by Yandex’s services. I’ve had similar experiences in the past when I tried switching my MX mailservers from using Google to Yandex and then again when I switched from using Google DNS servers to Yandex. Each time, I found the Yandex offerings so plagued with delays as to be unusable –and had to switch back to using Google again.

Shame really, as I do try and avoid monopolistic companies like Google as much as possible, and to give ‘new kids on the block’ a chance. But, at the end of the day, I need “stuff” to work reliably and Yandex haven’t convinced me that their “stuff” does yet –at least for users based outside of Russia.

Incidentally, my Yandex Webmaster dashboard is showing that their crawler is visiting the site on an almost daily basis, last visit only yesterday. However their search index is still only returning pages which were removed months ago. So I don’t know exacly what their robot is doing when it comes a-calling. It certainly isn’t noticing that anything has changed!

Yandex uses a number of different robots. Yandex really long time indexing sites for the main index. But to search at a site uses a separate robot. Its activity is displayed on the site search page. https://site.yandex.ru/searches/ (Russian version)

On page of webmasters displayed robot activity that adds pages in the main index Yandex is available at yandex.ru

Indexing request

You can raise the indexing priority for results pages from a given search source on the page of the corresponding search.
Select the appropriate search on the My searches page and go to the Indexing item. You can send requests three different ways:
Send HTTP-requests
Send requests manually

https://yandex.ru/support/site/optimizing.xml

Hi @stiobhart, I was trying to look at your implementation and am getting a 403 Forbidden Error?

Wow! —thanks for that. I set up SSL on a couple of my sites a few days ago and didn’t notice i had a typo in the redirect from HTTP to HTTPS.

Working again now.

[But I’m no longer using GCSE. I couldn’t stand the bloated code so switched to DDG. Unfortunately that’s utterly crap as the index hasn’t updated in months. So the search [pun intended!] continues]

1 Like

Search no more my friend. For it is Me, Tapir, that might have a solution for your problems.

In all seriousness, the (many) reasons stated in this topic are exactly why we built tapirgo.com, a stupid simple static site search.

A side-project we started in 2011 that managed to stick around for the past five years that’s basically ElasticSearch with a job that fetches your RSS feed every ~15 minutes an indexes it.

We expose an API endpoint and you can use our jQuery plugin, or write something yourself to get the results. You have full control over the styling of the results and there are no Ads.

(We found this topic because we see a recent uptick in static sites that are generated by Hugo and we wanted to see what the fuzz is about, Hugo looks great! :))

Dammit! –not another thing to add to my “Things to Investigate” list.

It is supposed to be a Bank Holiday Weekend over here, you know! I should be lying on a beach somewhere, with a large stash of liquid refreshments to hand.

1 Like

An example of the implementation of the Yandex search. Yesterday I added the 100 pages on the site, today they are already indexed. If someone wants to test requests then try the Russian words Чехов and Станиславский.
The site as a whole is not quite completed yet, so there are flaws in the design.

As of a week or so ago, Yandex [via DuckDuckGo search] finally seems to have indexed my site. I’ve no idea why it took so long, especially since I jumped through all their ‘webmaster’ hoops to try and expedite things.

In fact, it was only a combination of:

A. Trying to avoid Google’s monstrous custom search engine code
B. My own inertia
C. The dearth of decent alternatives

…that meant I left the non-functioning search on my site long enough for it finally to crank into some sort of life. That said, the ‘Image Search’ part still never returns anything for any search.

YYMMV*

Swiftype would also have been my preference, but now their cheapest plan starts at $299 per month, with no more free or cheaper plan available (ref. “Swiftype silently drops free plans”).

An alternative that currently does have a free plan is Algolia but from what I understand you’ll need to provide the “schemaless objects” yourself, which is way more complex than I wanted to make search for myself.

What I like about Swiftype is that they crawl your site and create the search index for you, with no hassle. That’s especially helpful for people with static hosting (e.g., S3), since tools like Bleve need a server.

Quality of search engine
Furthermore, Swiftype are “search engine creators” themselves and will probably improve the search tool themselves. The quality of the different search engines seems to be overlooked in this discussion, but I think this is quite an important point.

If I look at Bleve, then I see that searching for “search” returns results that do not include “searching”, and when using “searching” as the query then none of the pages that match “search” show up. In other words, Bleve doesn’t consider “search” and “searching” synonyms and that’s a problem I think.

(Case in point, Google lists the current topic as the first result when searching both for “hugo site search” and “hugo site searching”, and I believe that a good Hugo search engine should do the same.)

DuckDuckGo slow to index
The problems that Stiobhart mentions with slow indexing with DDG is also something I experienced with that search engine. So the only way to reasonably simple implement searching for non-Go developers seems to be Google custom search.

Swiftype alternatives?
Does perhaps anyone know about a Swiftype alternative (free or affordable) that manages the searching for us?