Making replaceRE substitutions lowercase? For an implementation of wikilinks

natehn · November 28, 2021, 3:25am

Hi there - I’m trying to find some way to make my replaceRE substitutions lowercase.

Here’s what I’ve put in my wiki single.html template, for context:

{{ .Content | replaceRE `\[\[(\S*)\]\]` "<a href='/wiki/$1/'>$1</a>" | replaceRE `\[\[(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2/'>$1 $2</a>" | replaceRE `\[\[(\S*)\s(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2-$3/'>$1 $2 $3</a>" | replaceRE `\[\[(\S*)\s(\S*)\s(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2-$3-$4/'>$1 $2 $3 $4</a>" | replaceRE `\[\[(\S*)\s(\S*)\s(\S*)\s(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2-$3-$4-$5/'>$1 $2 $3 $4 $5</a>" | safeHTML }}

This recognizes text within double-brackets in my wiki notes and translates it to a link.

It is mostly working like a charm, except for capitalized wiki pages. The links go to the upper case URL. Here’s the live page where I’m testing it out: Berlin

The Go/RE2 RegEx does not support \L, in case you all have used it on RegEx 101.

I tried putting a lower function in my Goldmark render-links.html (i.e., | lower) but that precedes the replaceRe operation, as far as I can tell. I also tried setting disablePathToLower to false with no change in behavior (since that is the default) and then to true. I thought the latter would solve my problems - titlecase links going to titlecase pages - but somehow it has not.

If there’s one thing I might have done wrong, it could be that I need to do better in render-links.html. Continuing to look into it.

Any ideas? (Suggestions for my duct-taped RegEx code also welcome ;)) Worst case scenario, I guess all my note titles will have to be in lower case?

(See all code here: github.com/natehn/blog, as well as /natehn/blog-theme and /wiki.)

pamubay · November 28, 2021, 4:19am

Not quite sure about the wikilink syntax format, but looking at your regex pattern it’s delimited by space?

Try this:

{{- $Content := .Content -}}

{{- /* 
//   strict match, must contains any character (except line terminators) 
//   inside the double square brackets `[[<any-character>]]` 
*/ -}}
{{- $linkRE := `\[{2}(?U:.+)\]{2}` -}}
{{- $linkFound := findRE $linkRE $Content -}}

{{- /* Replace all wikilink with placeholder */ -}}
{{- $Content = $Content | replaceRE $linkRE `<!--wikilink-->` -}}

{{- range $wiki_link := $linkFound -}}
  {{- $text := trim $wiki_link "[ ]" -}}
  {{- $href := delimit (split ($text | lower ) " ") "-" -}}
  
  {{- $linkFmt := `<a href="/wiki/%s/">%s</a>` -}}
  {{- $link := printf $linkFmt $href $text -}}

  {{- /* Replace All Wikilink inside $Content */ -}} 
  {{- $Content = replace $Content `<!--wikilink-->` $link 1 -}}

{{- end -}}

{{- /* Output */ -}}
{{- $Content | safeHTML -}}

EDIT: using (?U:.+) instead of (?U:[\s\S]*)

davidsneighbour · November 28, 2021, 7:50am

That looks like a severe allergy against multiple lines but here is an idea. Put the lower after the last replaceRE. Like this:

{{ .Content | replaceRE `\[\[(\S*)\]\]` "<a href='/wiki/$1/'>$1</a>" | replaceRE `\[\[(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2/'>$1 $2</a>" | replaceRE `\[\[(\S*)\s(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2-$3/'>$1 $2 $3</a>" | replaceRE `\[\[(\S*)\s(\S*)\s(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2-$3-$4/'>$1 $2 $3 $4</a>" | replaceRE `\[\[(\S*)\s(\S*)\s(\S*)\s(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2-$3-$4-$5/'>$1 $2 $3 $4 $5</a>" | lower | safeHTML }}

chrillek · November 28, 2021, 7:51am

That’s a character class consisting of space and non-space characters. Why not use a dot to match every character?

chrillek · November 28, 2021, 9:45am

I suggest that you do not use a greedy match here. If you have several links on the same line, your RE will gobble them up all in one big match – which is probably not what you want. Instead, use a lazy match like \S*? or \[\[([^[]+\]\] (probably rather the first one because of legibility).
Also, your pattern \S*\s\S*\s etc. look a bit … like an anti-pattern. Especially since your attempt to replace blanks by dashes will not work with more than five words. I’d suggest a two-step approach:

find all the wiki links ignoring spaces in them with \[\[([^[]+\]\]
for each of them, replace the space(s) with dashes and possibly lower case the characters(not in .Content, just build a second list)
for each of the wiki links, replace it with its cleaned-up <a href=... version from the list you just buit.

That should make for easier to read (and maintain) code, and it will also for for 10 words in a wiki link…

pamubay · November 28, 2021, 11:23am

Ah you are right, always forgot that . also includes space

@davidsneighbour putting lower here will lowercasing all the .Content value right?

{{ .Content 
    | replaceRE `\[\[(\S*)\]\]` "<a href='/wiki/$1/'>$1</a>" 
    | replaceRE `\[\[(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2/'>$1 $2</a>" 
    | replaceRE `\[\[(\S*)\s(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2-$3/'>$1 $2 $3</a>" 
    | replaceRE `\[\[(\S*)\s(\S*)\s(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2-$3-$4/'>$1 $2 $3 $4</a>" 
    | replaceRE `\[\[(\S*)\s(\S*)\s(\S*)\s(\S*)\s(\S*)\]\]` "<a href='/wiki/$1-$2-$3-$4-$5/'>$1 $2 $3 $4 $5</a>" 
    | lower 
    | safeHTML }}

davidsneighbour · November 28, 2021, 12:28pm

Yes, that’s why I criticised the one line approach. With multiple lines and printf it would be easily doable. Put the regexpes into a slice, range through it and collect what comes out, then range over that and use lower for the first %s and get the second out as is. (print '<a href="%s">%s</a>' ($bla | lower) $bla)

davidsneighbour · November 28, 2021, 12:49pm

By the way, I am not pretending to fully understand whats going on in those 5 regexpes, but I am pretty sure you can solve that with some form of {1,5} rule to match one to 5 of these groups (a group containing characters that are not whitespace ended by one whitespace or full stop).

chrillek · November 28, 2021, 1:28pm

Yes, and there’s no reason to set an arbitrary limit of 5 either. It’s something like \S+(?:\s+\S+)*. One has to make sure that the RE is not too greedy, too. So probably using a negated character class like [^[] would be better than \S

natehn · November 28, 2021, 5:31pm

First of all, thank you so much everyone!!! Learning so much from this thread.

(Yes, I suppose I could add a few line breaks in there @davidsneighbour XD)

I used your suggestion @pamubay and I’m thrilled! Works like a charm. So grateful for your help. Hopefully this thread will be useful for others looking to hop on the wikilink (e.g., Obsidian, Roam, Logseq, etc.) train, because I could not find much about it when perusing the internet.

system · November 30, 2021, 5:32pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
replaceRE on page template against links with spaces support	7	600	October 21, 2021
Char Replace in Template Code? support	4	2960	September 26, 2015
replaceRE function not allowing functions on the replacement support	8	921	February 25, 2020
Is it possible format the submatch via replaceRE? support	3	395	November 14, 2022
replaceRE on render hook support	2	575	October 21, 2021

Making replaceRE substitutions lowercase? For an implementation of wikilinks

Related topics