Golang pcre, extracting substrings from matches

Tom_Durand · January 3, 2023, 7:42pm

Hi,
I want to make a list of all internet references of a page beside the pictures.
I intended to run a regexp on .Content with findRE, but it looks like the goland flavor of RE has neither lookhead nor lookbehind ?
How can I extract a substring from a match without them ?
I would need to make an array of all strings starting http or www, following either : or ]( and ending at the first ) or a new line/carriage return.
I could do that usually, but with these limitations I can’t.
I know about capturing groups, but how do I say “extract that group” ?

jmooring · January 3, 2023, 7:59pm

An example would help us to help you.

Tom_Durand · January 3, 2023, 8:21pm

This ( in bold characters) would
match:

[AAAAAAA](http://www.aol.fr)
[BBBBBBB](www.google.fr)
[CCCCCCC](www.google.fr " sdf
sdfds")
[XXXXXX](www.youtube.fr " sdf\

While none of these would qualify: no image remote or local, and no local links:

![local image](Images/images_article_instincto/illustration_instincto.webp “The answer is to observe wild animals, to discover our original food range.”)
[local_link](philosophy)
![remote link](https://yyz2.discourse-cdn.com/flex036/user_avatar/discourse.gohugo.io/tom_durand/32/14460_2.png)

and the whole of that would make the list (after removing duplicates and prefixes):
[“aol.fr”, “google.fr”, “youtube.fr”]
which looped over would produce the text file consisting of:

aol.fr
google.fr
youtube.fr

Tom_Durand · January 3, 2023, 8:41pm

Ah I’m sorry I just saw the thread below now, popping up on top of the list or so.
It seems to say the only option to proceed in two times, when it comes to capturing groups.
then I make a new array with only with what includes // or www and does not include !.
This should do it:
{{ $array1 := union (findRE(?s)((![.](.(//)|(www).)|(/s+:/s+![.](.(//)|(www).)).Content) (findRE(?s)cite=“.“.Content) (findRE(?s)quote=”.”.Content)}}
then I loop over this and apply:
findRE (?s)((![.](.(//)|(www).)|(/s+:/s+![.](.(//)|(www).)) .Content
What do you think ? Are there cases in markdown that it would miss ?

system · January 5, 2023, 8:41pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Expression to extract urls with findRE support	17	527	April 21, 2023
Regular expressions: (sub)matching groups not supported? support	5	2183	January 1, 2023
replaceRE lookbehind support	2	481	August 13, 2019
How to use findRESubmatch tips & tricks	9	932	April 20, 2023
Performing functions on a captured group in replaceRE support	4	1620	January 4, 2019

Golang pcre, extracting substrings from matches

Related topics