Extract external links from a page

I have external links on a Hugo site, that changes on a regular basis and I need to be notified immediately. All the links are defined in markdown files in the content/ directory, not in theme.

I would like to be able to extract all theses external links in a JSON format (or any structured format) when building the site, so that I can track them. I have the option to parsing all the HTML files in the public directory after rendering, but if there is a solution to do that while rendering the site with Hugo, that would be more convenient.

Any idea?

After a while, I found a solution, that suits me. So here it is

/layouts/_default/_markup/render-link.html
Store all external links using a Scratch at page level

{{ if or (strings.HasPrefix .Destination "http") (strings.HasPrefix .Destination "//") }}
{{ .Page.Scratch.Add "links" (slice .Destination) }}
{{ end }}

/layouts/home.links.json
Create a custom format that loops through the pages and extracts the links

[{{ range $index, $page := .Site.Pages }}
{{- if ne $page.Type "json" -}}
{{- if and $index (gt $index 0) -}},{{- end }}
{
	"uri": "{{ $page.Permalink }}",
	"links": {{ $page.Scratch.Get "links" | jsonify }}
}
{{- end -}}
{{- end -}}]

config.toml
Configure the custom format at homepage

[outputs]
home = [ "HTML", "RSS", "JSON", "Links"]

[outputFormats]
  [outputFormats.links]
    baseName = 'links'
    isPlainText = true
    mediaType = 'application/json'
1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.