How are you implementing site search


#62

Also worth watching for static search is:


#63

@Jura While not nearly as robust as the aforementioned solutions, I’ve successfully added lunr.js to a static site for full-text search. Here is the link to the ajax call I use. I find that running lunr through uglifier as part of my build seems to cause some issues, so I’d recommend calling the lunr.js script first:

Part of this also means adding a simple bash script to convert the html file Hugo generates to a .json, but I want to say that the ability to write to generate a search index is still forthcoming:

https://github.com/spf13/hugo/pull/1853

(The most recent release already added a jsonify, etc, which makes it easier already.)

If you want to see it in action—although bear in mind that the site is just dummy text and currently used as my playground for Hugo—go to https://ryanwatters.io. I really should put together a tutorial for this. Lunr seems to have very few problems handling sites with even a few thousand pages.


#64

@rdwatters Thanks for the suggestion and information. I already came across your helpful posts when looking into search for Hugo, and how you implemented search on your site certainly looks nice.

But I have two (quite personal) obstacles with this. First my knowledge isn’t to this level. To give a very simple illustration: I don’t see the search-lunr-ajax-call.js file reappearing in the HTTPS requests on your website (perhaps it’s coupled inside another file). So while that code looks helpful, implementing it and tying everything together is a whole other matter. :slightly_smiling:

Second, lunr.js search seems to work good, but doesn’t catch all queries. If I for instance search for “appendChild” or “google” (which both come up in this blog post), then I get no results even though both words do appear in your json index.

Perhaps lunr.js excludes results from code examples, but since I use code examples often that’s not something I’m looking for. (I don’t mean this reply pessimistic or negative, by the way).


#65

@Jura No worries. As I said, it’s not nearly as robust. Again, please don’t take ryanwatters.io as indicative of a full-fledged site, seeing as it’s nothing but dummy text, which probably also doesn’t help with the search UAT either.

W/r/t your question on search-lunr-ajax-call.js, the reason you’re not seeing it is because I that file is concatenated and uglified with the rest my js files using Gulp. I should have clarified in the previous comment.

[[UPDATE]] I haven’t looked at the lunr docs in a bit, but I think the reason that the code might not be showing up in search has to do with the stop word filter and a few other built-in features of lunr.js. That said, the pipeline can be extended relatively easily. I would check out the issues list on GH for examples if you are still interested in going this route.


#66

I’m experimenting with Algolia now; I’m quite impressed by features they offer in terms of how their search works (like dealing with typos), and I like the idea that they’re a company dedicated to search. And they quite a lot of funding, so they probably can keep innovating and improving.

Sadly, they don’t have a “Hugo plugin” or something so I’ll need to update my index by hand now (copy-paste a json file). Your (@rdwatters) write-up of how you use Lunr.js has been a big help for creating a json file, so thank you very much for that.


I’ve also experimented with Searchly, which has a free plan (5mb index) and offers a crawler (so in theory no need for json files), but had a technical issue for support, emailed them, and haven’t heard back from them in 5 days. So they’re not an option for me personally.


#67

After seeing support for multi-lingual sites as well as ability to use org-mode markup, the only remaining thing to move from Python-powered static-site-generator to Hugo is to have simple setup for search. In Nikola it’s easy to have search feature via Tipue, so wonder what does Tipue setup involve in Hugo and/or whether you moved to lunr.js?

Any pro/cons in regard to Tipue/lunr.js?

I’d deploy (few) Hugo sites on Webfaction hosting…


#68

This may help in creating some kind of “searc index”:


#69

Thanks. Do you still consider to make it happen for 0.20?


#70

It IS happening for 0.20.


#71

0.20 is released, so wonder if there are any hints how to proceed now in deploying custom search for Hugo?


#72

There’s documentation about the new output formats here.


#73

I don’t have time to go through the entire thread, but if no one has mentioned this, here it is:

{{ if isset .Site.Params.widgets "search" }}
{{ if .Site.Params.widgets.search }}
<div class="panel panel-default sidebar-menu shadow">

    <div class="panel-heading">
      <h3 class="panel-title"><i class="fa fa-search" aria-hidden="true"></i> {{ i18n "searchTitle" }}</h3>
    </div>

    <div class="panel-body">
        <form action="//google.com/search" method="get" accept-charset="UTF-8" role="search">
            <div class="input-group">
                <input type="search" name="q" results="0" class="form-control" placeholder="Search">
                <input type="hidden" name="q" value="site:{{ .Site.BaseURL }}">
                <span class="input-group-btn">
                    <button type="submit" class="btn btn-template-main"><i class="fa fa-search"></i></button>
                </span>
            </div>
        </form>
    </div>
</div>
{{ end }}
{{ end }}

Here’s the code for my search partial at https://MarijuanaDaily.NET - As you can see, I’m simply posting the form to //google.com/search

For example, I type in “MDMA” in the search box, and this is what it turns into at Google:

https://www.google.com/search?q=mdma&q=site%3Ahttps%3A%2F%2Fpsychedelicsdaily.com%2F

mdma site:https://psychedelicsdaily.com/ - Shows in the google search box.

I didn’t want to bother with any facy site search stuff. This works.

You can always use AJAX to post to google and display the search results in the search box itself, just a list of articles found. I’m too lazy.

But there are farily easy solutions to this w/o having to spin around too much.

EDIT: Sorry for confusion. Both of my sites use that template. No worries. <3

In addition: https://stoned.io (my personal blog) has a CSE by Google. I don’t like it. I just sucks. Sorry. Go there, click on the menu button, and it’s there sidebar. Compare and contrast with https://MarijuanaDaily.net or https://PsychedelicsDaily.com

Let me know what you think of the difference between posting to google and GCSE. The results seem to be different.

The searching in CSE seems to return … nonsense results.


#74

I am interested in this too. Can anyone please post a little tutorial on how to make use of this feature? How to create a .json search index before running Hugo and how does that tie in with this feature?


#75

Algolia https://www.algolia.com/ is pretty great at indexing content. They have a great interface and a lot of options to build your own search algorithm


#76

At Aerobatic, we just released our new keyword search plugin. Adding search to a Hugo site hosted with Aerobatic is as easy as adding a couple of lines of YAML.

https://www.aerobatic.com/docs/plugins/keyword-search/


#77

Hi,

Actually, I am trying to use your code with some modifications. I want to produce 5 departures by each language and thus several indexes. The problem is that when I am generating the file its generated with values null.

`
grunt.registerTask(“lunr-index”, function() {

    grunt.log.writeln("Building pages index...");

    var indexPages = function() {
        var pagesIndex = [];
        grunt.file.recurse(CONTENT_PATH_PREFIX, function(abspath, rootdir, subdir, filename) {
            grunt.verbose.writeln("Parse file:",abspath);
            pagesIndex.push(processFile(abspath, filename));
        });

        return pagesIndex;
    };

    var processFile = function(abspath, filename) {
        var pageIndex;

        if (S(filename).endsWith(".html")) {
            pageIndex = processHTMLFile(abspath, filename);
        } else {
            pageIndex = processMDFile(abspath, filename);
        }

        return pageIndex;
    };

    var processHTMLFile = function(abspath, filename) {
        var content = grunt.file.read(abspath);
        var pageName = S(filename).chompRight(".html").s;
        var url = S(abspath)
            .chompLeft(CONTENT_PATH_PREFIX).s;
        return {
            title: pageName,
            url: url,
            content: S(content).trim().stripTags().stripPunctuation().s
        };
    };

    var processMDFile = function(abspath, filename) {
        if (S(filename).contains('.cs')) {
            var content = grunt.file.read(abspath);
            var pageIndex;
            // First separate the Front Matter from the content and parse it
            content = content.split("+++");
            var frontMatter;
            try {
                frontMatter = toml.parse(content[1].trim());
            } catch (e) {
                conzole.failed(e.message);
            }
                if (frontMatter.url) {
                    var url = frontMatter.url;
                } else {
                    var url = S(abspath).chompLeft(CONTENT_PATH_PREFIX).chompRight(".md").s;
                    if (filename === "index.md") {
                        url = S(abspath).chompLeft(CONTENT_PATH_PREFIX).chompRight(filename).s;
                    }
                }

            // Build Lunr index for this page
            pageIndex = {
                title: frontMatter.title,
                tags: frontMatter.tags,
                url: url,
                content: S(content[2]).trim().stripTags().stripPunctuation().s
            };
            return pageIndex;
        }
    };

    grunt.file.write("static/json/search.cs.json", JSON.stringify(indexPages()));
    grunt.log.ok("Index built.");
});

This code is my grunt task as example.


#78

I picked up @Tapir_Go after reading a great blog at http://www.lovesmesomecode.com/20140124-adding-search-to-static-html-site/

It’s using Middleman, but the logic of it all is basically the same.

The simplicity is all I needed, didn’t need anything fancier than that.


#79

Interesting.

I checked out your demo for Hugo and it doesn’t have search. Can you add it to your demo if possible?

Also, how are you different from Netlify?


#81

Hi, this blog post shows the demo app with search enabled as well as source code.

As for Netlify - some differences around pricing and features, but yes, a similar service.

Cheers.


#82

Just dredging up this old thread to give another metaphorical Kick up the Arse to DuckDuckGo / Yandex.

I recently ported another of my old blogs over to Hugo from Tumblr and dropped in my usual search.html partial [which uses DDG]. I then went onto the Yandex Webmaster pages [Yandex powers DDG search], added the site there and uploaded the sitemap.xml generated by Hugo.

Over a month later and the search box on the site is still throwing up results pointing to the old URLs from when the blog was hosted on Tumblr.

Meanwhile if I do a regular Google search with the site:mydomain.net parameter added, Google’s results all accurately reflect the new URLs. So, once again Google has picked up all the new URLs without even being told about them, while DDG/Yandex has still not managed to re-index the site, in spite of having been explicitly told to do so and having had a complete XML sitemap submitted.

If I look at the Web Crawler stats page for the domain on Yandex, it shows that the site is being crawled, but seemingly at the rate of about one page every few days

Graph shows pages crawled per day. Most ever crawled was 5!

…and also that the 404s and 301- Moved Permanently are being recorded. But presumably also at the same snail’s pace as indicated in the graph above. At this rate it will take months to reindex the site!

Slowly recording the new URLs. But at a snail’s pace

It’a a shame really because, the DDG search is really easy to add to your site, And, with a bit of styling, it integrates not too badly --albeit the results open on DDG rather than your own site. But I don’t know what the hell Yandex are playing at. How can their web crawler be so slow?