How to use findRESubmatch

Hugo 110 brought us findRESubmatch which would be quite useful, if it were documented correctly and understandably. Firstly, the function should be called findRESubmatches, since it finds all matches, not only one – there’s no flag like in other RE implementations to ask for a single or all matches. Secondly, it does not return “a slice of strings”, but rather a “slice of slice of strings”. Here’s what I found out (which may be wrong, not complete etc.):

  • findRESubmatch is Go’s findAllSubmatch (again this weird singular, but so be it).
  • If the RE doesn’t match at all, the function returns nil.
  • If it matches, the function returns a slice of slice of strings.

Example 1, no capturing groups

findRESubmatch(`b`, "ab") returns [["ab" "b"]]

Example 2, one capturing group, one occurrence

findRESubmatch(`a(.)`, "ab") returns [["ab" "b"]] Access the content of the capturing group with index (index 0 $result) 1: The inner index gives you ["ab" "b"], the outer one retrieves the content of the first capturing group from that, which is b in this case.

Example 3, one capturing group, two occurrences

findRESubmatch(`a(.)` "abac") returns [["ab" "b"] ["ac" "c"]]. You’d use index (index 0 $result) 1 to access the first capturing group of the first match, etc.

Example 4, named capturing group

findeRESubmatch(`a(<Pletter>.)` "abac") behaves exactly as an unnamed capturing group, i.e. the name is not available in the match. Interestingly, the Go documentation keeps mum about that, too. Using a dict in that case would’ve been nice.

x matches, y capturing groups

That results in a slice of x slices, each containing y+1 strings. The first string is always the current match, i.e. what the whole RE matches. The rest are the subgroup matches.

Nested capturing groups

findeRESubmatch(`a(.(.)(d))` "abcd") returns [["abac" "bc" "c" "d"]], thus the nested capturing groups from the outside to the inside. That’s consistent with their numbering.

Left to right – really?

Hugo’s as well as Go’s documentation use the terms “leftmost” (for the first match9 and “left to right” (for the order of matches) in their description of the RE functions. In my opinion, this is misleading, as “leftmost match” makes sense only for left-to-right writing systems. In a right-to-left writing system (Arabic, Hebrew, at least), the first match should be the _right_most, and the order in which matches are returned should be right to left.

Either the wording in the documentation is correct, then the RE matching behaves strangely in certain locales. Or the RE matching works ok, then the wording is wrong.

Feel free to use that in the documentation. I raised an issue about its current state here

1 Like

I appreciate that someone is laying out how unpractical REs are at the moment.
At this point, allowing calls to external commands (sed) within hugo should be considered, as it would produce a much more readable code, with a much more straightforward expressions than index (index 0 $result) 1 !
We rarely need all the matches and submatches at once, usually people want to extract subgroups or rearrange them, and in that case the number one useful feature - named groups - is missing.
Thank you chrillek, I’ll find findRESubmatch very useful from now on.

This maps directly to Go’s FindReSubMatch, which is in line with other template funcs that is just very shallow wrappers.

You’re probably referring to FindAllSubmatch. And yes, I noticed that Hugo uses only a very shallow wrapper.
What about the other points:

  • FindRESubmatch returning a slice of slice of strings, not a slice of strings?
  • It working from left to right only in left-to-right locales, which makes the usage of “left-most” and “from left to right” incorrect?

from a logical standpoint it makes perfect sense: a slice of all matches, represented by a slice of their submatches.
But the end user simply needs more than just a shallow wrapper.
We need something like function FindRe (source: string; regexp: string; MatchNumber: Positive; MatchingGroupNamed: string; MatchingGroupNumbered: Positive) return Table_of_result
with a few wrappers in case we need to return a single string, or input a slice of groups instead of a string or number, etc.
It’s Ada/pseudocode but understandable enough. All of that wrappers around the current FindRE or FindAllSubmatch.

Hugo is open source. You can add what you’re missing.

Btw: Hugo’s approach to find regular expressions is very similar to JavaScript’s. Which seems to work for a lot of people.

1 Like

Well not really… A surface level understanding of one language is the limit of my capability :wink:
I didn’t know about javascript. So a lot of people don’t think the way I do, no surprise. I just gave some user feedback, if most people are please, fine by me :wink:

I will revisit this in an attempt to combine descriptions from both FindStringSubmatch and FindAllStringSubmatch.

The frame of reference is the string (slice of bytes), not how the string is displayed (LTR or RTL). This is true for all[1] programming languages. For example, Go’s strings.TrimLeft operates on the left side of the slice of bytes, regardless of LTR/RTL display mode. The same is true with Python’s str.lstrip function.

You could petition the Go team to rename their functions (e.g., strings.TrimStart instead of strings.TrimLeft [2]) and descriptions (e.g., “leading” instead of "left), but I wouldn’t hold my breath.

From a documentation standpoint, unless there’s a gross error, we follow Go’s lead when wrapping their functions.

Ned Flander's Leftorium


  1. Maybe there’s an oddball exception out there, something akin to Brainf___. ↩︎

  2. Precedent: JS aliases trimLeft() to trimStart() ↩︎

2 Likes

See https://gohugo.io/functions/findresubmatch/.

2 Likes

Thanks a lot! That’s really a huge improvement.