Expression to extract urls with findRE

Hi, what golang expresssion should I use to extract the url in all these expressions ?

[aol](www.aol.fr)
[aol](http://www.aol.fr)
[aol](https://www.aol.fr)

I thought to use (?:\[[^]]*]\()([^]]*)(?:\)) but neither non-capturing groups nor look-ahead expressions work.

Go (and consequently Hugo) supports only a limited set of RE expressions:

This \[.*?\]\((.*?)[\) ] does the trick: Non-capturing group (and those are supposed to work!), a non-greedy match for [.*](, followed by a capturing group matching .* non-greedily, followed by a closing parenthesis or a space (for image URLs). Beware though, that this expression will also catch image URLs, you might want to take care of that.

With REs, as with everything else, simpler is better. In your example, there’s no need whatsoever for non-capturing groups. Nor for Look-ahead/behind expressions. And yes, non-capturing groups do work. It might be worth it to double-check before stating that a product is at fault.

I know the expression is valid, I used https://regex101.com/. The question is how do I use it with findRE ?
Currently this:

{{ with $cite }}
{{ $inter := index (findRE `(?:\[[^]]*]\()([^]]*)(?:\))` .) 1 }}
{{with $inter}}{{ printf "cite=\"%s\"" . | safeHTMLAttr }}
{{ end }}{{ end }}

does not work with this

```quote{cite=“[2021, Sociosexual behaviour in wild chimpanzees occurs in variable contexts and is frequent between same-sex partners](Sociosexual behaviour in wild chimpanzees occurs in variable contexts and is frequent between same-sex partners in: Behaviour Volume 158 Issue 3-4 (2021))”}
```

because $inter is empty, while the regexp is correct.

Well, if everything is ok … I guess you’ll have to check again.

What I meant is I don’t see the point of your advice: why telling me to use expressions in hugo that do not work with hugo ? Non-capturing are recognized (eventhough I can’t make use of it with findRE) but look-aheads aren’t.

I didn’t say to use look-ahead/behind. I said you don’t need it here.

Beyond findRE giving a slice of all matches and this precise regexp working, the code below still produces empty cite expressions. To be precise, it doesn’t enter the {{with $inter}}{{end}} part. I would appreciate if someone could correct this, since I didin’t find the docu of findRE very helpful, so this is above my understanding.

{{ $author := .Attributes.author  }}
{{ $class   := .Attributes.class }}
{{ $complex   := or (or $class (or $author (or .Attributes.cite .Attributes.id))) (not (in $class "simple")) }}
{{ if $complex }}<figure {{with .Attributes.id}}id={{.}} {{end}}class="non-picture {{with $class}}{{.}}{{end}}">{{end}}
<blockquote

{{ with .Attributes.cite }}
{{ $inter := index (findRE `\[.*?\]\((.*?)[\) ]` .) 0 }}
{{with $inter}}{{ printf "cite=\"%s\"" . | safeHTMLAttr }}
{{ end }}{{ end }}

>{{ .Inner|$.Page.RenderString }}</blockquote>
{{if $complex }}{{ if or $author .Attributes.cite }}<figcaption class=cite_class>{{with $author}}{{.|$.Page.RenderString }}{{end}}{{with .Attributes.cite }}<cite> in {{.|$.Page.RenderString}}</cite>{{end}}</figcaption>{{end}}
</figure>{{end}}

to successfully extract urls and put them in cite="..." ? An exemple:

```quote{cite="[moteur](www.aol.fr)"}
random junk
\```

(ignore the ""). this should produce a cite="www.aol.fr" html attribute.

Have a look at the Hugo documentation around findRe. I’m fairly confident that you’ll find an answer to your “royally ignored” question.
For me, your attitude makes trying to help you not very fun. Therefore, I’m out here.

1 Like

Best I could come up with:

{{ with .Attributes.cite }}
{{ $inter := replaceRE (?:\()([^]]*)(?:\)) “$1” (index (findRE (?:\[[^]]*]\()([^]]*)(?:\)) .) 0) }}
{{with $inter}}{{ printf “cite="%s"” . | safeHTMLAttr }}
{{ end }}{{ end }}

this still gives .Attributes.cite without modification, if it contains a link.
In my own opinion there should be more and simpler functions. Something aking to findRESubmatch, which would produce a slice of the groups matching. Use of regexps should be a straightforward as possible, because as is it’s not for normal folks, even courageous ones.

What do you want as the result in your example?
You want www.aol.fr with the scheme trimmed?

If that is it I’d probably just use trim

I haven’t tested so see if split can split on more than one character but you could try with a split on “//”

trim does not make sense when the prefix is not fixed (no pun intended). The OP does need an RE if their code is supposed to work with varying strings.

findRE will not help them there, findRESubmatch would.

Given the examples of needing to trim http:// and https:// it seems simple enough.
Unless the examples are not complete I don’t see the issue?

the prefix is fixed, thanks to my links hook. Always starting by either http:// or https://.
I had not thought of those functions ! It’s as the saying goes: if your only tool is a hammer, every problem becomes a nail.
How about that?

{{ $cite := strings.TrimPrefix (index (findRE “!?[.*]((https?://)?(www.)?” .Attributes.cite) 0) .Attributes.cite }}
{{ $cite = strings.TrimSuffix “)” $cite }}

That way I could even put an image or a detail shortcode call, and extract the url, like I always wanted. Pretty funky, but as long as the cite attribute is valid, who cares?
but for now it says “syntax error” for the first line, even after rearranging. I don’t see what’s wrong.

Error: add site dependencies: load resources: loading templates: “/home/drm/WEBSITE/themes/hugo-book/layouts/_default/_markup/render-codeblock-quote.html:3:1”: parse failed: template: _default/_markup/render-codeblock-quote.html:3: invalid syntax

Usually they’re more verbose. So I can assume it’s not a matter of missing parenthesis ?
And yes, in blabka I want to extract URI, so the link (or image or whatever, no matter) can appear in but the cite attribute of the blockquote element is still an URL.

I don’t see the need for an regexp at all if it is just those two
if string contains, then trim

This works also as far as I can tell, maybe there is some weird URL that could break it but it workds for the examples.

{{ last 1 (split "https://www.test.com" "://") }}

Thanks, it works perfectly:

<blockquote {{if in .Attributes.cite “](”}}{{safeHTMLAttr (print “cite="” (strings.TrimSuffix “)” (index (last 1 (split .Attributes.cite “](”)) 0) ) ‘"’) }}{{end}}>

I wonder though, isn’t there a simpler way than index (last 1 (split .Attributes.cite "](")) 0) ) ? this looks stupidly convoluted. The function string can fuse all strings of an array into a string as what merge does for map.

Sorry, I assumed the separation of text and link as in text were handled by markdown and you just wanted to trim the scheme.
Should still be able to do it this way but you need to account for the case with no scheme present.

1 Like

Here you go:

<blockquote {{if in .Attributes.cite “](”}}{{safeHTMLAttr (print “cite="” (strings.TrimSuffix “)” (index (last 1 (split .Attributes.cite “](”)) 0) ) “"”) }}{{else if or (in .Attributes.cite “www”) (in .Attributes.cite “http”) }}{{safeHTMLAttr (print “cite="” .Attributes.cite “"”) }}{{end}}>

if scheme → … else url but no scheme → … else no cite attribute.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.