Hugo 110 brought us
findRESubmatch which would be quite useful, if it were documented correctly and understandably. Firstly, the function should be called
findRESubmatches, since it finds all matches, not only one – there’s no flag like in other RE implementations to ask for a single or all matches. Secondly, it does not return “a slice of strings”, but rather a “slice of slice of strings”. Here’s what I found out (which may be wrong, not complete etc.):
findAllSubmatch(again this weird singular, but so be it).
- If the RE doesn’t match at all, the function returns nil.
- If it matches, the function returns a slice of slice of strings.
findRESubmatch(`b`, "ab") returns
findRESubmatch(`a(.)`, "ab") returns
[["ab" "b"]] Access the content of the capturing group with
index (index 0 $result) 1: The inner
index gives you
["ab" "b"], the outer one retrieves the content of the first capturing group from that, which is
b in this case.
findRESubmatch(`a(.)` "abac") returns
[["ab" "b"] ["ac" "c"]]. You’d use
index (index 0 $result) 1 to access the first capturing group of the first match, etc.
findeRESubmatch(`a(<Pletter>.)` "abac") behaves exactly as an unnamed capturing group, i.e. the name is not available in the match. Interestingly, the Go documentation keeps mum about that, too. Using a
dict in that case would’ve been nice.
That results in a slice of x slices, each containing y+1 strings. The first string is always the current match, i.e. what the whole RE matches. The rest are the subgroup matches.
findeRESubmatch(`a(.(.)(d))` "abcd") returns
[["abac" "bc" "c" "d"]], thus the nested capturing groups from the outside to the inside. That’s consistent with their numbering.
Hugo’s as well as Go’s documentation use the terms “leftmost” (for the first match9 and “left to right” (for the order of matches) in their description of the RE functions. In my opinion, this is misleading, as “leftmost match” makes sense only for left-to-right writing systems. In a right-to-left writing system (Arabic, Hebrew, at least), the first match should be the _right_most, and the order in which matches are returned should be right to left.
Either the wording in the documentation is correct, then the RE matching behaves strangely in certain locales. Or the RE matching works ok, then the wording is wrong.
Feel free to use that in the documentation. I raised an issue about its current state here