About problem with canonifyurls

@id027102 Thank you. This is an excellent question, and you are not the first to have struggled with it.

I think there are a couple of topics here:

  1. What does the canonifyURLs site configuration setting do?
  2. How do I configure a site to be served from a subdirectory?

What does the canonifyURLs site configuration setting do?

When your markdown is initially converted to HTML, the rendering engine (Goldmark) converts your links to <a> elements, and images to <img> elements, following the CommonMark specification.

The rendering engine has no context for link conversion. It just follows a set of rules. It neither knows nor cares about:

  • The URL to your server root (https://example.org/)
  • The URL from which your Hugo-generated files will be served (https://example.org/blog/)
  • Whether or not a <base> element exists with the <head> element of a Hugo template
  • The baseURL Hugo configuration setting
  • The canonfiyURLs Hugo configuration setting
  • The relativeURLs Hugo configuration setting

Regardless of these settings, in the absence of a markdown render hook, the markdown renderer will always convert this:

[text](destination)
[text](/destination)
[text](./destination)
[text](../destination)

to:

<a href="destination">text</a>
<a href="/destination">text</a>
<a href="./destination">text</a>
<a href="../destination">text</a>

Try it.

When you ask Hugo to canonfiy URLs, it performs the operation after the markdown has been converted to HTML. Conceptually…

The canonify process performs a simple search and replace within the HTML generated by the renderer. It looks for things like href=/a or src=/b, and replaces the leading / (if there is one) with the baseURL in your site configuration.

Try it. Set canonifyURLs = true in your site configuration, then create a markdown page that looks like this:

+++
title = 'Test'
date = 2021-11-09T20:12:32-08:00
draft = false
+++
href=a

href=/a 

src=b 

src=/b

This is the published page:

image

Notice that:

  • The markdown did not contain any links.
  • The markdown renderer did not create any anchor elements.
  • The canonify process found some text in the rendered HTML that looked like a URL with a leading /, so it replaced the leading / with the baseURL from the the site configuration.

That’s what canonifyURLs does. It’s a lot like sed, or Ctrl+H in your favorite editor.

How do I configure a site to be served from a subdirectory?

If you are using a theme developed by a third party, it may be difficult or even impossible without modifying the theme. Why? Because some themes were not designed or coded with this configuration in mind.

An ideal, high quality theme would allow you to:

  • Serve the site from the server root or from a subdirectory
  • Enable or disable canonifyURLs
  • Enable or disable relativeURLs
  • Enable or disable uglyURLs
  • Navigate the published site without a web server (open /index.html from the file system)
  • Generate valid RSS feeds
  • Localize your site for multiple languages
  • See the images you have included in a page while using your markdown editor
  • Navigate the links you have included in a page while using your markdown editor

Additionally, the theme author must provide documentation explaining how to:

  • Configure your site for the supported scenarios
  • Construct your links and images in markdown

Are there any themes that meet all of these criteria? I don’t know. It’s not a trivial goal.

Method 1: Base Element

One approach to allowing a site to be served from the server root or a subdirectory is to include a <base> element within the <head> element of each template (or in baseof.html if used). That would look like:

<base href="{{ site.BaseURL }}">

From Mozilla

The <base> HTML element specifies the base URL to use for all relative URLs in a document. There can be only one <base> element in a document.

Also from Mozilla

Example: /en-US/docs/Learn

This is the most common use case for an absolute URL within an HTML document.

So what does this mean?

  1. In the context of using a <base> element to resolve a URL, the browser treats links beginning with / as absolute URLs. Yes, I think most of us would consider links beginning with / to be relative URLs, so the terminology is inconsistent.
  2. You have to be careful when constructing your markdown links. With the <base> element set as described above:

baseURL = 'https://example.org/'

Markdown Resolves To
[root](/) https://example.org/
[site](.) https://example.org/
[about](about/) https://example.org/about/
[book-2](books/book-2/) https://example.org/books/book-2/</a>
[video-2](videos/video-2/) https://example.org/videos/video-2/
[fragment](books/book-1#heading-4) https://example.org/books/book-1#heading-4

baseURL = 'https://example.org/hugo/'

Markdown Resolves To
[root](/) https://example.org/
[site](.) https://example.org/hugo/
[about](about/) https://example.org/hugo/about/
[book-2](books/book-2/) https://example.org/hugo/books/book-2/</a>
[video-2](videos/video-2/) https://example.org/hugo/videos/video-2/
[fragment](books/book-1#heading-4) https://example.org/hugo/books/book-1#heading-4

Pros:

  • One line of template code
  • Works when serving from server root or subdirectory

Cons:

  • Same-page fragment URLs must specify entire path
  • Auto-generated table of contents will be broken due to same-page fragment URLs
  • canonifyURLs = true will have no effect because none of the URLs are root-relative (except for root
  • relativeURLs = true will have no effect because none of the URLs are root-relative (except for root)
  • uglyURLs = true will break the site

While adding a <base> element and carefully constructing your links may solve your markdown problems, other theme elements (images, icons, CSS, JS, fonts, menus, etc.) may still be broken when trying to serve your site from a subdirectory. It depends on how the theme was built.

Method 2: RelRef Shortcode and Leaf Bundles

Hugo has a built-in relref shortcode to help you construct links in your markdown:

[about]({{< relref "/about" >}} 
baseURL Resolves To
http://example.org/ http://example.org/about/
http://example.org/hugo/ http://example.org/hugo/about/

Pros:

  • No template changes required
  • Works when serving from server root or subdirectory
  • Same-page fragment URLs do not require entire path (i.e., use #heading)
  • Auto-generated table of contents will be correct
  • canonifyURLs = true works as designed
  • relativeURLs = true works as designed
  • uglyURLs = true works as designed

Cons:

  • The relref shortcode works for links in your markdown, not for images.
    This does not work:

    ![My cat]({{< relref "/images/cat.jpg" >}})
    

For a site that you can serve from the server root or from a subdirectory, use Leaf Bundles:

content/blog/
└── post-1/
    ├── cat.jpg
    └── index.md

And use this markdown:

![My cat](cat.jpg)

While using the relref shortcode and leaf bundles may solve your markdown problems, other theme elements (images, icons, CSS, JS, fonts, menus, etc.) may still be broken when trying to serve your site from a subdirectory. It depends on how the theme was built.

Method 3: Render Hooks (Advanced)

You can create markdown render hooks for links, images, and headings. This allows you to intercept the markdown for these elements and render them as you wish.

@bep created an example of the portability benefits here:
https://github.com/bep/portable-hugo-links

But, just as with the other two methods…

While using markdown render hooks may solve your markdown problems, other theme elements (images, icons, CSS, JS, fonts, menus, etc.) may still be broken when trying to serve your site from a subdirectory. It depends on how the theme was built.

3 Likes