Hi,
This is the last technical hurdle for converting my 500+ post blog from Jekyll and Hugo. With a bit of elbow grease, scripting and help from this most excellent community every other issue was solved.
I’ve searched everything here and on Google and I’m having cold feet on making the migration due to this URL issue. Having a ~10 year old domain’s URLs change or having a 301 redirect on every request is less than desirable.
For reference my intent is to host this on a server with nginx serving the content.
URL expectations and Hugo
Here’s a quick rundown of what I’m trying to accomplish and what Hugo does. Notice the trailing slash vs no trailing slash on some of the URL examples.
Desired URLs:
- https://example.com
- Root page
- https://example.com/about
- Individual page
- https://example.com/blog/hello-world
- Individual blog post
- https://example.com/blog/
- A list of blog posts
- https://example.com/blog/page15/
- A list of blog posts for a specific page
- https://example.com/blog/tag/hello
- A list of blog posts for a specific tag
What Hugo generates with uglyURLs: false
:
- https://example.com
- Root page
- https://example.com/about/
- Individual page
- https://example.com/blog/hello-world/
- Individual blog post
- https://example.com/blog/
- A list of blog posts
- https://example.com/blog/page/15/
- A list of blog posts for a specific page
- https://example.com/blog/tag/hello/
- A list of blog posts for a specific tag
What Hugo generates with uglyURLs: true
:
- https://example.com
- Root page
- https://example.com/about.html
- Individual page
- https://example.com/blog/hello-world.html
- Individual blog post
- https://example.com/blog.html
- A list of blog posts
- https://example.com/blog/page/15.html
- A list of blog posts for a specific page
- https://example.com/blog/tag/hello.html
- A list of blog posts for a specific tag
What I can deal with
- Jekyll uses
page15
where as Hugo usespage/15
for paginated resources- I prefer Hugo’s format for this but is there a way to change it to support the other one?
- In the worst case scenario I can 301 redirect these pages and not lose much sleep over it
What I can’t deal with
- If I use ugly URLs then the HTML response returned by nginx will include
.html
in all of the URLs and crawlers will think that’s the source of truth- I do not want to 301 redirect all of those to pretty URLs
- If I don’t use ugly URLs then all links produced by Hugo will have a trailing slash
- I do not want to 301 redirect the ones that don’t have it
- In my case, that’s 500+ posts and a few pages
- I do not want to 301 redirect the ones that don’t have it
In either case above:
- It will result in either my SEO changing or most of my requests being a 301 before the content is served
- I’d very much like to continue using
relref
andref
because it helps prevent broken links- I could technically hard code my preferred URL style for every link but to be honest, that is a big enough hit that I’d abort the migration over it, link validation is too important to lose, I have hundreds of cross links
Looking at Let’s Encrypt’s Hugo site
While researching this I found Showcase: Let’s Encrypt.
If you visit https://letsencrypt.org/ they are using pretty URLs and you can see the trailing slash on most pages.
However, if you visit their blog such a Intent to End OCSP Service - Let's Encrypt you can see there’s no trailing slash. There’s also no trailing slash in the HTML.
There is no 301 redirect happening here, you can verify that with curl -I https://letsencrypt.org/2024/07/23/replacing-ocsp-with-crls
which returns a 200. Curl would normally expect you to set -L
to follow redirects.
I don’t see anything configured in a special way at the Hugo level to support this behavior.
Based on website/netlify.toml at main · letsencrypt/website · GitHub it looks like they are using Netlify but I don’t see any special rules defined here either.
What are they doing to get this effect? Their blog is indeed using Hugo based on website/content/en/post at main · letsencrypt/website · GitHub, there’s posts from a few days ago.
Content rewriting as a last resort
I know nginx has a non-default module to support this with sub_filter
Module ngx_http_sub_module. I’d very much like to avoid using this as this requires a custom compiled version of nginx and it does this at runtime on every request.
I could likely write a custom Python script to do this on the published directory as a final build step but this seems like a really brittle solution.
Is there any way to accomplish what I’m trying to do with Hugo itself?
Thanks.