Directory Bear — A 946K-page web directory, fully static with Hugo

DirectoryBear · April 9, 2026, 5:30pm

Directory Bear — A 946K-page web directory, fully static with Hugo

I built Directory Bear, a web directory with nearly a million pre-computed site profile pages, all generated and served as a fully static Hugo site on Bunny CDN.

What it is

Directory Bear aims to be the world’s largest static web directory. Every listed site gets its own profile page with a proprietary “Bear Rank” score (a composite of popularity, authority, longevity, and safety), AI-generated descriptions, favicon, and categorization across 49 family-friendly categories.

There are two listing tiers — free (nofollow link) and verified ($20 one-time, dofollow link + badge) — with submissions handled through a Bunny Edge Script and a static admin panel.

The stack

Hugo for the entire site build
Python data pipeline merging Tranco, Majestic Million, and OpenPageRank datasets
GPT-4o-mini for AI enrichment (category classification, one-line descriptions, overviews, tags, FAQs)
Bunny CDN for hosting, favicon storage, form submissions via Edge Scripts — everything

How I made Hugo work at ~1 million pages

This was the real engineering challenge. A few things that made it possible:

Hash bucketing. Every domain is bucketed by MD5(domain)[:2], giving 256 content directories with ~3,900 files each. This keeps Hugo from choking on a single massive directory. URL structure: /w/{hash}/{domain}/.

All data in front matter. Each .md file carries everything — BR score, tier, category, favicon path, AI overview, tags, FAQs — all in YAML front matter. Hugo templates just render it. No large JSON lookups, no filtering at build time.

Pre-computed everything. Category pages are pre-paginated (50 per page) by a Python script. Search uses progressive JSON prefix files (type “goo” → fetch goo.json), not a monolithic index. Featured sites, top rankings, new sites — all pre-built as Hugo data files.

Segment builds. For updating non-site pages (submit form, about, homepage), I move content/w/, content/categories/, and static/favicons/ to /tmp/, run Hugo (takes seconds), then move them back. This avoids rebuilding 946K pages just to update a CSS file. A shell script (hugo_segment.sh) handles this.

Favicons out of static/. With 900K+ favicon PNGs in static/, Hugo would try to copy all of them to public/ on every build. Moving them out during builds and uploading them separately to CDN was essential. Client-side SVG letter avatars (deterministic color per domain) handle any missing favicons gracefully.

Individual site additions. New submissions go through add_site.py which creates the page, downloads the favicon, computes the BR score, updates data files, and optionally runs AI enrichment — no need to touch the full pipeline.

Build numbers

946,000+ site profile pages
900,000+ favicon images (Google S2 + DuckDuckGo fallback)
49 pre-paginated category sections
Progressive search across nearly a million domains via tiny JSON prefix files
Full build requires ulimit -n 65536 on macOS

Lessons learned

Hugo can absolutely handle near-million-page sites, but you need to be deliberate about directory structure and what goes into static/.
Put everything in front matter. The less work Hugo templates do, the faster your builds.
Pre-compute aggressively. If something can be a static JSON file instead of a template computation, make it a static JSON file.
Segment your builds. Don’t rebuild a million pages to fix a typo on your about page.

Happy to answer any questions about the build or the approach!

Pooja · May 8, 2026, 5:58pm

My main question is around incremental updates + segmented Hugo builds when adding only a small number of new pages (say 100–200).

Some CI/CD trick?

icedogas · June 1, 2026, 7:52pm

Nice work, I hope traffic will come over time. I recommend adding reviews and a ‘is this site fake or not’ section — people often google that kind of thing.

Topic		Replies	Views
Building a Static Business Directory with 5 Million of Pages and huge TAXONOMY with Hugo is Possible? support taxonomy , pagination	3	1756	April 23, 2023
https://twitter.com/SmiileLiive Announcements	7	1272	November 30, 2017
SermonIndex.net — 1,080,000+ Static Pages with Hugo (Bible, Commentaries, Hymns, Books & More) Showcases	1	146	March 23, 2026
How are you implementing site search support	85	43562	April 9, 2026
How I built a Wirecutter-style review site — with full transparency — using Hugo Showcases taxonomy , performance	8	433	April 21, 2026

Directory Bear — A 946K-page web directory, fully static with Hugo

Directory Bear — A 946K-page web directory, fully static with Hugo

What it is

The stack

How I made Hugo work at ~1 million pages

Build numbers

Lessons learned

Related topics