Possible to exclude content in .Plain?

I’m generating a search index using the following code:

{{- .Scratch.Add "pagesIndex" slice -}}
{{- range $index, $page := where .Site.RegularPages "Type" "in" .Site.Params.mainSections -}}
	{{- $pageData := (dict "title" $page.Title "href" $page.RelPermalink "content" $page.Plain) -}}
	{{- $.Scratch.Add "pagesIndex" $pageData -}}
{{- end -}}
{{- .Scratch.Get "pagesIndex" | jsonify -}}

Is there any way to exclude certain tags (especially <code> and <pre>) from .Plain, any hacky workaround, or something? A rework of the above code that would generate a similar output would also do. The reason I’m asking this is because when these search results get rendered, the look of the page gets very disturbing as the code in the search results is not formatted or highlighted.

The only option I have thought of currently is to exclude the entire page from search index, but I’m not up for it.

1 Like

It seems strange that .Plain contains HTML tags.

In any case the plainify func is meant to be used for striping HTML and returning plain text:

.Plain doesn’t include Markdown converted to HTML so that’s good to go, but from what I’m seeing, it does include code written within the markdown file. The same happened with | plainify in my test.

Check here: “Flexsearch” as the search query. Thankfully, this particular query is returning just one sentence, so it’s still fine, but some other query might return a lot of characters and that looks really bad, take this for example: “Search” as the search query.

Here’s my repo if it helps: GitHub - Hrishikesh-K/Portfolio at v2

I just tried using RegEx on .Content, and it worked. This is the code I used to strip all unwanted HTML content:

{{- .Scratch.Set "pagesIndex" slice -}}
{{- range where .Site.RegularPages "Type" "in" .Site.Params.mainSections -}}
	{{- $contentRE := replaceRE "(<pre.*>(.|\n)*?</pre>)|(<code.*>(.|\n)*?</code>)|(<h[2-4].*>(.|\n)*?</h[2-4]>)|(<[^>]*>)|(\\n)|(\\t)" "" (.Content | htmlUnescape) -}}
	{{- $pageData := (dict "title" .Title "href" .RelPermalink "content" $contentRE) -}}
	{{- $.Scratch.Add "pagesIndex" $pageData -}}
{{- end -}}
{{- .Scratch.Get "pagesIndex" | jsonify -}}

There might be some other errors that might be occurring, but so far, all seems good.

EDIT: Found a little problem. The <ol> tags create an issue. Here’s how the list is:

  1. Logo × 1
  2. Envelope × 1
  3. Visiting card × 1
  4. Letterhead × 1
  5. Newspaper advertisements × 3
  6. Magazine advertisements × 3
  7. Out Of Home advertisemets × 2

But, the JSON is like this: Logo × 1Envelope × 1Visiting card × 1Letterhead × 1Newspaper advertisements × 3Magazine advertisements × 3Out Of Home advertisemets × 2. I guess I can live with that small thing for now or might try to fix it too. It’s a matter of lack of a space between the next element and the number.

Hope this helps someone.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.