Index and list URLs from content

This is related to my line width query. I am seeing if I can generate plain text files with Hugo, in a similar format as newsbeuter.

Here’s a long example, to demonstrate how the latest Mozilla blog post is saved:

Title: Improving the Firefox Privacy Notice
Author: 
Date: Fri, 29 Sep 2017 04:57:33 -0700
Link: https://blog.mozilla.org/blog/2017/09/28/improving-firefox-privacy-notice/
 
Back in 2014, we reorganized our privacy policies to make them simple, clear, 
and usable[1]. That effort was based on simplifying the then 14-page privacy 
policy around a framework that retained some detail but helped users find 
information more quickly. We did this because of our Data Privacy Principles[2] 
that offer us guardrails as we develop our products and services.
 
Today I’m happy to announce another revision of our Firefox Privacy Notice[3], 
which follows our initial [4]announcement on the topic.  We continue to build 
our products focusing on user control and fulfilling our “no surprises” rule 
when it comes to privacy.  We believe that in context notices with the user 
experience in mind make notices more understandable and actionable for users. 
Our updated notice includes:
 
 * A layered design to show what we collect, why we collect it, where you can 
learn more, and what your choices are.
 * Language that is more specific and transparent when describing the types of 
data.  We have used the same terms as our internal teams, including: “technical”
data, “interaction” data, “webpage” data and “location” data.
 * A more holistic explanation of how a feature interacts with data.  For 
example, we previously had a separate privacy notice for cloud features like 
Sync.  This technical distinction was confusing, so we removed that separate 
privacy notice and have made it a part of the new Firefox Privacy Notice where 
context is more understandable.
 * On desktop platforms that support it, we have begun adding the ability to 
link the user directly into the appropriate user preferences so they can easily 
and quickly access privacy controls.
 
[image 5]
 
We’ve also changed our Firefox onboarding experience so that the Privacy Notice 
now displays on the second tab of a newly installed browser.
 
Take a look and tell us if we met the standards we set by going to Governance 
mailing list[6].
 
We hope all of this offers a more meaningful opportunity for users to learn 
about how we design privacy into Firefox, and make choices about the data they 
wish to share.
 
The post Improving the Firefox Privacy Notice[7] appeared first on The Mozilla 
Blog[8].
 
Links: 
[1]: https://blog.mozilla.org/netpolicy/2014/04/08/clearer-mozilla-privacy-website-policies/ (link)
[2]: https://www.mozilla.org/privacy/principles/ (link)
[3]: https://www.mozilla.org/privacy/firefox (link)
[4]: https://blog.mozilla.org/netpolicy/2017/09/06/making-privacy-transparent/ (link)
[5]: https://blog.mozilla.org/wp-content/uploads/2017/09/Firefox_Privacy-300x249.png (image)
[6]: https://groups.google.com/forum/#!topic/mozilla.governance/9txB7etE7E4 (link)
[7]: https://blog.mozilla.org/blog/2017/09/28/improving-firefox-privacy-notice/ (link)
[8]: https://blog.mozilla.org (link)

The final part, where the URLs are listed at the end, that is the part that interest me. I’d like to grab all the URLs in the content and list them. Footnote-like numbering is cool, but not necessary.

I don’t know Go, but I’ve been able to stumble my way through Hugo templates okay. I haven’t used Scratch, don’t really grok it.

So I am wondering if this is possible with just Hugo (sans an external tool), and would love if someone could give me a primer on this. I suspect it has to do with matching each word with a URL regex, and creating an index, but I am honestly not sure. :slight_smile:

Hadn’t read this for a while, so I think this is what I need to use to pull out links. Will test it out, see how far I get. :slight_smile:

I actually just implemented something similar in my theme to tell the browser to preload any images in the content:

{{ if .Content -}}
    {{ $urls := findRE "<img src=\"[^\"|\\\"]*\"" .Content -}}
    {{ range $url := $urls -}}
      {{ $url := (strings.TrimPrefix "<img src=\"" $url) -}}
      {{ $url := strings.TrimSuffix "\"" $url -}}
      <link rel="preload" href="{{ $url | htmlUnescape | safeHTML }}" as="image" />
    {{ end -}}
{{ end -}}

If you’re looking for all linked URLs (i.e. all <a> tags), you should be able to use something like:

{{/* This isn't tested */}}
{{ if .Content -}}
    {{ $.Scratch.Set "count" 0 }}
    {{ $urls := findRE "<a href=\"[^\"|\\\"]*\"" .Content -}}
    {{ range $url := $urls -}}
      {{ $.Scratch.Add "count" 1 }}
      {{ $url := (strings.TrimPrefix "<a href=\"" $url) -}}
      {{ $url := strings.TrimSuffix "\"" $url -}}
      [{{ $.Scratch.Get "count" }}]: {{ $url }}
    {{ end -}}
{{ end -}}

Alternatively, you can manually create footnotes in markdown with the following syntax:

This is a footnote.[^1]

[^1]: the footnote text.
1 Like

Thanks! That is a great start for me to hack on. Where is that live? I’d love to see the output.

Markdown footnotes are dope, but I am a fan of full links in hypertext. I am not quite sure how I want to list the links, such as trying to indicate where they existed in the text. I wouldn’t mind the footnote notation style in the plain text version, but I don’t want them in the web doc. :slight_smile:

Where is that live? I’d love to see the output.

The preloading example is live on my programming blog. A page with some images on it would be this one. If you view the source code, near the end of <head> will be these lines:

<link rel="preload" href="/images/no-gapps/guide/davdroid-add-google-account.png" as="image" />
<link rel="preload" href="/images/no-gapps/guide/davdroid-name-google-account.png" as="image" />
<link rel="preload" href="/images/no-gapps/guide/davdroid-google-account-overview.png" as="image" />
1 Like