SermonIndex.net — 1,080,000+ Static Pages with Hugo (Bible, Commentaries, Hymns, Books & More)

Hey Hugo community!

I wanted to share what we’ve built at SermonIndex.net — a Bible study, sermon, and Christian encyclopedia platform now serving over 1.08 million static pages generated with Hugo. We believe this is one of the largest Hugo sites in production today, and I wanted to share how we pulled it off and the Hugo-specific patterns that made it possible.

About SermonIndex

SermonIndex was founded in 2002 with the mission to preserve classical Biblical preaching and promote revival. Over 20+ years it has grown to reach practically every country in the world, distributing over 100 million sermon resources from over 2,300 speakers — names like A.W. Tozer, Leonard Ravenhill, C.H. Spurgeon, David Wilkerson, and many more. The entire library is free.

In 2025 we began a ground-up rebuild of the platform on Hugo, and the scope grew into something far beyond a typical static site.

Live site: https://www.sermonindex.net

What 1.08 Million Pages Looks Like

Here’s the full breakdown of every content section:

Section Pages
Bible (1,265 translations) 573,829
Books (classic Christian library) 186,875
Commentary (173 commentaries) 88,539
Encyclopedia 75,843
Speakers (2,315) + Sermons (59,886) 57,233
Parallel Bible 31,086
Devotionals (44 collections) 15,839
Strong’s Concordance 14,200
Topics (14,090) 14,091
Hymns (2,020 authors, 10,461 hymns, 550 categories) 13,058
Bible Topics (4,807) 9,666
Interlinear Bible (66 books) 1,256
Total ~1,080,268

The Bible section alone — with 1,265 translations spanning dozens of languages — accounts for over half the pages. Each chapter page includes study notes, key verse analysis, cross-references, keyword highlights, audio playback, and auto-linked scripture references.

Infrastructure

  • Hugo for static site generation

  • Bunny CDN for hosting (storage zone based)

  • DigitalOcean App Platform running a NestJS API for dynamic features (Bible API, search)

  • Cloudflare DNS in front

  • PostgreSQL on DigitalOcean for the Bible verse database (42 translations, 885K verses, 33,741 chapters)

  • Mac Minis as build machines

How We Built It with Hugo

The Core Problem: You Can’t Build 1M Pages in One Shot

Early on we realized that a single Hugo build of 1 million+ pages wasn’t practical. So we developed a content generation + sectional build approach. Each content section (Bible, Commentary, Books, Hymns, etc.) has its own generation pipeline:

  1. Source data lives in generated_* folders — structured JSON produced by Python scripts from various sources (database exports, audio transcriptions via Whisper AI, commentary archives, linguistic databases, etc.)

  2. Python build scripts transform JSON into Hugo-ready Markdown files with YAML frontmatter, placed into content-dev/ which is the source of truth for builds.

  3. Hugo builds are run per-section or in targeted batches, with output uploaded to Bunny CDN.

This means we can rebuild just the commentary section, or just the hymns, without touching the other 900K+ pages.

Hugo Patterns That Made This Possible

1. Frontmatter-heavy content architecture

Our Bible chapter pages are a good example. Rather than cramming everything into the Markdown body, the bulk of the structured data lives in YAML frontmatter — study themes, key verses, Christ-centered notes, preaching outlines, keyword highlights, cross-references, FAQ sections, and more. The Hugo template then renders all of this into a rich study page with sidebar panels, highlight toggles, and interactive features.

This keeps templates in full control of layout and lets us update content structure without touching templates.

2. Auto-linking scripture references with a Hugo partial

We built a linkscripture.html partial that auto-links every Bible reference found in any text on the site. It uses a placeholder strategy to avoid re-matching:

  • Pass 1: Match full references like “1 Corinthians 15:22” → replace with placeholder XSCRIPTREF0001

  • Pass 2: Match book + chapter like “Genesis 3” → placeholder

  • Pass 3: Match short verse refs like “v.3” → placeholder

  • Final: Swap all placeholders with <a class="green-link"> links

This partial is used everywhere — study notes, commentary text, devotionals, topic descriptions — giving the entire site consistent cross-referencing without any manual linking.

3. The data/ directory for large lookup tables

For sections that need shared reference data across templates (book code lookups, translation metadata, speaker indexes), Hugo’s data/ directory is invaluable. We load JSON files and use index in templates to pull what we need without creating content pages for reference data.

4. renderSegments for development

With over a million pages worth of content sitting in content-dev/, iterating on a single template would be painful without this. Hugo’s --renderSegments flag lets us target just one section:

bash

hugo server --renderSegments commentary

This was essential for the development cycle. We could tweak the commentary chapter template and see results in seconds rather than waiting for a million-page build.

5. Section-specific SEO in baseof.html

With 12+ content sections, each needing different title formats and schema markup, we built section-aware logic in baseof.html:

  • Bible chapters: Genesis 1 (BSB) | SermonIndex

  • Commentary: Zechariah 5 — Matthew Henry's Commentary | SermonIndex

  • Speakers: F.J. Huegel — Sermons | SermonIndex

  • Sermons: Title by Speaker Name | SermonIndex

  • Topics: 83 sermons on Abiding in Christ | SermonIndex

  • Hymn categories: 111 Hymns on Praise & Thanksgiving | SermonIndex

Each section also gets appropriate JSON-LD schema — Person for speakers, Article for Bible/commentary, BreadcrumbList everywhere, and WebSite+Organization on the homepage.

6. CDN-editable config without rebuilds

Some things shouldn’t require a Hugo rebuild to change. Our cookie consent and donate popups are controlled by a popups.json file that lives on the CDN. Edit the JSON, the behavior changes instantly — no build, no deploy. The JS reads the config at runtime. We also use head-inject.js and body-inject.js files for any post-deploy code injection needs.

The Content Pipelines

Each section has its own generation story:

  • Bible (573K pages): 1,265 translations pulled from a PostgreSQL database, each chapter getting its own Markdown file with study notes generated via AI. We had to fix book code mismatches across ALL translations (like NAM→NAH, SNG→SOS) — a global find-and-replace across hundreds of thousands of files.

  • Books (186K pages): 3,000+ classic Christian books (Pilgrim’s Progress, Pursuit of God, etc.) broken into chapter-level pages for reading online.

  • Commentary (88K pages): 173 Bible commentaries from authors like Matthew Henry, Darby, Whedon, Spurgeon, and many more. Each chapter gets a rich page with verse-level commentary entries, author bio sidebar, and quick-jump navigation across Old and New Testament books.

  • Speakers + Sermons (57K pages): 2,315 speaker profiles with 59,886 sermons. We imported 4,994 video sermons from CSV files with transcriptions, resolved 308 title conflicts, and merged 10 speaker name misspellings. Each speaker’s _index.md has a verified sermon_count matching the actual content.

  • Hymns (13K pages): 10,461 hymns across 2,020 authors and 550 categories, with category pages like “111 Hymns on Praise & Thanksgiving.”

  • Strong’s Concordance (14K pages): Hebrew and Greek word entries with definitions and cross-references.

  • Interlinear Bible (1,256 pages): Word-by-word original language analysis for 66 books.

Challenges and What We Learned

Book code consistency is brutal at scale. When you have 1,265 Bible translations, a single book code mismatch (NAM vs NAH for Nahum) breaks cross-references across tens of thousands of pages. We wrote scripts to audit and fix these across both the generated JSON and the content-dev Markdown — touching hundreds of thousands of files.

Frontmatter escaping will haunt you. When content is generated by scripts processing messy source data, special characters in YAML frontmatter cause silent Hugo build failures. We built escape_html_tags() early and it saved thousands of broken pages.

Verify everything programmatically. With 57 Bible translations in the dropdown and 97+ commentaries listed in the header, we wrote verification scripts to ensure every entry in the UI actually has corresponding content in content-dev/. At this scale, manual checking is impossible.

Thanks to Hugo

Hugo made this feasible. The build speed, Go templates, data/ directory, renderSegments, and the overall architecture gave us the foundation to build something at a scale that most people associate with database-driven platforms. The fact that we can serve over a million pages as pure static files — fast, secure, and cheap to host — is remarkable.

I’d also note that we saw the V&A Explore the Collections post on this forum when we were planning this build, and it gave us confidence that Hugo could handle what we were attempting.

Happy to answer any questions about the architecture, the content pipelines, or any of the Hugo patterns we used.

Site: https://www.sermonindex.net

1 Like

Just a note in our overview the only reason we are using a database is to serve our API of bible translations and sermons. We are actually actively working on migrating it to a static JSON based API on bunny CDN so we will keep everything database free and more economical.

Also another note that might be helpful to people:

How We Upload 1 Million+ Static Files to CDN

One of the questions people ask when they hear “1 million static pages” is — how do you actually get them onto the CDN? You can’t just rsync a million files and call it a day.

We upload directly to Bunny CDN’s storage API using HTTP PUT requests with 200 parallel threads. The approach is straightforward:

The basic idea: Python’s ThreadPoolExecutor submits every HTML file as an individual HTTP PUT request to Bunny’s storage endpoint. With 200 workers running simultaneously, we’re pushing 200 files at once. Each file is read into memory, sent with an API key header, and the CDN stores it at the corresponding URL path. At full speed this runs at roughly 200-400 files per second.

Retry with exponential backoff: Network requests fail — timeouts, temporary 5xx errors, connection resets. Every file gets up to 3 attempts with increasing delays between retries (1s, 2s, 4s). Most transient failures resolve on the second try.

Failure tracking: Any file that still fails after 3 attempts gets logged to a JSON manifest with its file path and the specific error. The actual HTML file is also copied to a retry directory. You can re-run the script with --retry and it picks up exactly where it left off — only re-uploading the files that failed. Once they all succeed, the manifest cleans itself up.

Sectional uploads: We don’t upload all 1 million files at once. Each content section (Bible, Commentary, Books, Hymns, etc.) builds and uploads independently. This means we can rebuild and re-upload just the commentary pages without touching the other 900K+ files already on the CDN. It also means if something goes wrong, only one section is affected.

Resume support: Each completed section produces a compressed backup (.tar.zst). When running a batch of sections, the script checks for existing backups and skips anything already done. So if the process gets interrupted at section 30 of 50, you restart and it picks up at section 31.

The whole system runs on a Mac Mini pushing to Bunny CDN over a standard internet connection. No special infrastructure needed — just Python, urllib, and patience.

Essentially this should encourage people to not be scared of hugo static building at scale, its possible and actually more satisifying.

A note on disk I/O — this matters more than you think. When Hugo builds a section, it writes tens of thousands of small HTML files to disk. When the upload script runs, it reads all of them back. And before the next build, they all get deleted. Multiply that by dozens of sections and you’re doing millions of file creates, reads, and deletes.

We learned early that the drive format matters as much as the drive speed. Our build machines use NVMe SSDs formatted as APFS (on macOS). APFS handles large volumes of small files significantly better than HFS+ — faster directory enumeration, better space efficiency, and it doesn’t bog down the way HFS+ does when a single directory has tens of thousands of entries.

If you’re attempting a large Hugo build on macOS and your external drive is formatted HFS+ (Mac OS Extended), reformatting to APFS can make a noticeable difference in build and cleanup times. On Linux, ext4 or XFS handle this well. The key thing is that at the million-file scale, your filesystem becomes a real bottleneck — even on fast NVMe hardware, a filesystem that isn’t optimized for many small files will slow everything down.

hope that helps someone! :slight_smile: