Scaling Pagefind to over 1 million pages

I am looking to try and configure Pagefind to work for over 1 million pages. Currently it has been crashing after trying 300,000 pages with hugo. The idea would be to utilize: multisite

Chat GPT ideas:

UI strategy: The emitted search-init.js defers merging shards until DOMContentLoaded. Change that logic to attach on the first keystroke if you want an even lighter landing.

Glob limiting: Each shard call restricts indexing to q/{prefix}/**/*.html so your canonical URLs remain correct (we keep --site at public/).

So..
The idea would be to treat each prefix for our q and a as a individual site: ie qeeebo.com/q/ab/
qeeebo.com/q/cc/
etc

So sharding the content via the prefixes we are already using. Any ideas from others? Essentially the problem we are looking a solution for is to have a simple search that works on 1 million + static pages.

also in our context we are just searching the titles of the pages and not the content answers to keep this scalable.

Have you raised the question here?
https://github.com/Pagefind/pagefind/discussions

1 Like