Situationally removing trailing slashes in URLs (yes, I read every thread)

Hi,

This is the last technical hurdle for converting my 500+ post blog from Jekyll and Hugo. With a bit of elbow grease, scripting and help from this most excellent community every other issue was solved.

I’ve searched everything here and on Google and I’m having cold feet on making the migration due to this URL issue. Having a ~10 year old domain’s URLs change or having a 301 redirect on every request is less than desirable.

For reference my intent is to host this on a server with nginx serving the content.

URL expectations and Hugo

Here’s a quick rundown of what I’m trying to accomplish and what Hugo does. Notice the trailing slash vs no trailing slash on some of the URL examples.

Desired URLs:

What Hugo generates with uglyURLs: false:

What Hugo generates with uglyURLs: true:

What I can deal with

  • Jekyll uses page15 where as Hugo uses page/15 for paginated resources
    • I prefer Hugo’s format for this but is there a way to change it to support the other one?
    • In the worst case scenario I can 301 redirect these pages and not lose much sleep over it

What I can’t deal with

  • If I use ugly URLs then the HTML response returned by nginx will include .html in all of the URLs and crawlers will think that’s the source of truth
    • I do not want to 301 redirect all of those to pretty URLs
  • If I don’t use ugly URLs then all links produced by Hugo will have a trailing slash
    • I do not want to 301 redirect the ones that don’t have it
      • In my case, that’s 500+ posts and a few pages

In either case above:

  • It will result in either my SEO changing or most of my requests being a 301 before the content is served
  • I’d very much like to continue using relref and ref because it helps prevent broken links
    • I could technically hard code my preferred URL style for every link but to be honest, that is a big enough hit that I’d abort the migration over it, link validation is too important to lose, I have hundreds of cross links

Looking at Let’s Encrypt’s Hugo site

While researching this I found Showcase: Let’s Encrypt.

If you visit https://letsencrypt.org/ they are using pretty URLs and you can see the trailing slash on most pages.

However, if you visit their blog such a Intent to End OCSP Service - Let's Encrypt you can see there’s no trailing slash. There’s also no trailing slash in the HTML.

There is no 301 redirect happening here, you can verify that with curl -I https://letsencrypt.org/2024/07/23/replacing-ocsp-with-crls which returns a 200. Curl would normally expect you to set -L to follow redirects.

I don’t see anything configured in a special way at the Hugo level to support this behavior.

Based on website/netlify.toml at main · letsencrypt/website · GitHub it looks like they are using Netlify but I don’t see any special rules defined here either.

What are they doing to get this effect? Their blog is indeed using Hugo based on website/content/en/post at main · letsencrypt/website · GitHub, there’s posts from a few days ago.

Content rewriting as a last resort

I know nginx has a non-default module to support this with sub_filter Module ngx_http_sub_module. I’d very much like to avoid using this as this requires a custom compiled version of nginx and it does this at runtime on every request.

I could likely write a custom Python script to do this on the published directory as a final build step but this seems like a really brittle solution.


Is there any way to accomplish what I’m trying to do with Hugo itself?

Thanks.

no

regarding trailing slashes I would say:

  • one style or another, don’t mix
  • SEO will recover over the time unless you keep duplicates (maybe some SOA tuning stuff can help)

UGLY: so consistent behavior - and yes the html file is the truth

NOUGLY: a consistent behavior - just bad for servers that handle it with redirects…
/ means in fact the webserver will decide what to do. Usually they scan for .html, .htm, a server page …

I would recap on your “desire”
IMHO your last desired does not match the layout it’s a list page without /

That said, I would go with the one or other - make a choice - don’t look back - you’re not going that way

Even if this were the case, the problem exists since Hugo will not let you use pretty urls without the trailing slash. I won’t be changing my canonical URLs to always have a trailing slash and having ugly URLs with .html being crawled as the definitive source isn’t something I want to commit to.

Fortunately I did come up with a solution in the end by overriding rel and relref to trim the trailing slash for all URLs except for the ones I wanted to keep the / on. The only pain point here is you can’t use shortcodes everywhere (layouts, etc.) so it requires duplicating this logic in a few spots and remembering to use this outside of the content directory.