GetPage, ref and relref renovations

There are a number of issues related to how one looks up or links pages in Hugo. They are worth looking at together because there is a lot of interaction:

  • Problems:
    • #4727: GetPage, rel and relref accept ambiguous refs without warning.
    • #4147: ref and relref need to support all kinds of pages.
      Changes introduced in v0.33 make this a must.
    • #4726: GetPage neither respects nor needs its kind arg. #4652 points out it isn’t needed.
  • Nice-to-haves:
    • #4728: Option to support unix-style path semantics
    • #3606: Support wiki-style internal page links

I’ve already implemented a holistic, coherent solution to the three problems:

  1. Define the absolute path of each node within the content directory (e.g. column one in the table here) as its guaranteed unique ref. (The leading “/” distinguishes it from shorthand refs with the same filename).
  2. Eliminate the kind arg from GetPage, and the implicit kind=page arg from the rel and relref shortcodes, allowing all three to reference any page or node in the content directory with the same calling arg and semantics. This consistency improves user-friendliness.
  3. Use a single backing index to support GetPage, rel and relref. Detect any ambiguous refs when building this index and report any uses of them as errors with user-friendly instructions to use the guaranteed unique ref.

The solution is clean, simple, and I believe backward compatible. The implementation leaves Hugo cleaner, simpler, safer, and maybe even a little faster. It also paves the way for the two nice-to-haves if they are deemed worth doing.

Some questions before I make a pull request

  • Can the kind arg from .Site.GetPage be removed without breaking existing themes or user templates?

  • I’ve already assumed it can’t, and my implementation:

    • Deprecates .Site.GetPage
    • Introduces .GetPage (method on Page) without the kind arg. This perfectly dovetails with later adding support for relative refs (#4728) since the call would have a Page context to be relative to. The ref/relref shortcodes already execute from a Page context, so this lines up with the goal of consistent semantics. If there are contexts without a page, .Site.Home.GetPage will do the trick, and in fact deprecated .Site.GetPage will just call the new GetPage this way.

    Does anybody like or not like this approach?

  • Does the new .GetPage need to be variadic like the old one? Can it simply take a single string arg like rel and relref do? Is the ability to make calls such as .GetPage("section" "subsection" "page") necessary or useful?

  • The docs say relref produces relative links in the output. But my tests show it does not, the resulting links in the output HTML are identical. Am I missing something? This has confused at least one other person, and @bep’s reply to him seems to indicate that it exists for a reason that would not be a reason anymore given my solution above.

I would love to spend more time on this. First thing that comes to mind is Instead of creating a GetPage method on page with less argument why not make it a function? Does it have to be a Page or Site method?

Hello,

First of all, that’s a very impressive, well-thought and well-researched write-up.

Whatever is decided, I sincerely hope that the “shorthand ref” is not discarded.

The ambiguity has never been a problem for me as real post names are not like “help” but something like “how-do-i-write-org-mode”. And if something is short like “help”, that would be the one and only post to have earned that.

One big benefit of the “shorthand ref” like {{< relref "my-post" >}} is that it does not matter if:

  • the page is a bundle (my-post/index.md)
  • or, a regular page (my-post.md)
  • or, with a different extension (my-post/index.org or my-post.org)

I agree that upgrading that ambiguity warning to an error would be useful. I cannot think of a case where a user would see the ambiguity warning and still would want to publish the site.

+1. I don’t have a use of this feature, but it definitely looks useful.

I would be fine with this approach. If this plan falls through, I believe that .Site.GetPage would result in warning… and may be after N months, that function get removed? What’s a good value of N here?

I personally never liked the variadic approach… I would prefer the one and only unix-path style.

1 Like

There isn’t even a warning today. An arbitrary choice is made silently, and as I describe in the issue, the choice can change silently when the site is updated or even Hugo is updated.

And thank you. I try. :cowboy_hat_face:

  • We don’t deprecate lightly anymore, at least not for the parts that are commonly used. Too much work. If we create a new .Page.GetPage it should be easy to forward calls from the old to the new. Then old can be hidden in the doc, and maybe removed some time in the future.
  • The Kind argument was put the mostly motivated by performance. Hugo is filled with features the average site does not use. .Site.GetPage is one. And if it’s used it is used to fetch a section or something. It would hurt performance to build a 10K page index to find 1 page. I’m not sure how putting everything into one index can be faster than the current partitioned lazy approach.

This is just my 50 cents.

That said, I might have implemented this when I was a little bit overly concerned about speed.

On the other hand, a 10K page site is far more vulnerable to the danger of link ambiguity.

In the benchmarks I ran (not sure I’m running the best ones), at least for smaller sites it seemed to run a little faster, perhaps because of the simplifications I made in the code in many places.

Hugo is famous for speed. I wouldn’t want to bet against your 50 cents!

The only way to do ref ambiguity detection in the presence of shorthand refs, whether or not we have a single or partitioned index, is by indexing all pages. If we want to avoid the speed hit, here are the other options I’ve thought of:

  • add a setting to disable (or enable) shorthand refs, so that the only supported refs are guaranteed to be unique.
  • add a setting to enable (or disable) ambiguity detection. One advantage of this is that it can be enabled only on the final build of the site, rather than each build during dev.
  • add a setting to enable unix-style refs (#4728) which would be guaranteed to be unique.

If we keep the partitioned index, we keep the Kind argument, but we also would need to add the arg to ref and relref per #4147.

We don’t have to keep the partitioned index, but we need to keep the lazy init (i.e. people who do not use .GetPage should not have to pay). He/she who implements this may toy around with maybe a radix tree (we have a library in use for that); find the page matching “shortest path” may work well here. Maybe use the existing cache but partition it by root folder, maybe?

Note that we need to keep this as bakcwards compatible as possible. I agree about most of the stuff you write. I would not be so dramatic about “ambigous refs”, that sounds a little bit like adding arguments just to make your case better. It is good enough as is.

If you currently say "GetPage “mypage.md” and have more than one “mypage.md”, the current behaviour is probably that you get the last page in the default sort, so it should at least be stable in most cases.

The below is basically a summary of what you describe. Arrest me if I’m wrong:

  • .Page.GetPage
  • One param: A Unix path
  • If the path does not start with a leading slash, we search relative to the current page
  • Else it is a search from the content root
  • .Page.GetPage without params would get the home page, I assume

Note that:

  • .Site.GetPage should continue to work as-is, but should be able to delegate that implementation to .Site.Home.GetPage.

@regis making it a method on Page makes sense here, I think. To get the “relative path behaviour” you kind of want a page reference, so you may as well make it explicit. I think.

Handling of ambigous lookups would possibly be breaking a little, and may be hard to implement in a speedy way, so only if it fits nice and easy into the implementation.