Best way to extract link graph

hlegendre · September 22, 2020, 6:51am

Hi there,

I am trying to implement internal back-linking as I have seen it is not supported. To do so, I plan on building a JSON containing the graph of all links in the data/ folder and using it to extract reverse links (ie. on page1, listing all the pages that mention it). If you have a better idea, please share !

Therefore I need a robust way to explore all my pages and extract the internal links mentioned (as relref, ref or []()) and generate the JSON. Maybe through a section custom format ? Ideally, I would need the permalink as well as the title.

Do you have any idea how to implement that ?

Thanks !

ju52 · September 22, 2020, 7:06am

you can create a custom JSON feed or a custom sitemap.

you can use my sample as starting point

so you get a 2 step implementation, feeding the results back to content.

1st create a new configDir to create only the sitemaps, and feed it back to content
specify it in the command line hugo --config createmap.toml
2nd create your site

hlegendre · September 22, 2020, 7:35am

Hi @ju52,

thanks for the input.
I see how I can create a custom JSON feed or sitemap (I have done a couple for my website already) but I fail to see how your example helps me with extracting internal links from each pages : i.e. for page1, if it mentions as internal link page2.md and page3.md (either as relref or other mean), I want to be able to have this information in my output to generate a graph of all the internal navigation provided by the content. This implies probably parsing each markdown except if there is a field on .Page with provides this information.
Can you point me in the right direction please ?

bep · September 22, 2020, 8:57am

If you limit the scope to Markdown links, you may have success with collecting them in a render hook and store them in site.Home.Scratch or something.

hlegendre · September 22, 2020, 9:38am

ok @bep, I’ll try something along this path. Thank you.

I would need my custom page (that renders this Scratch) to be built last in order to be complete though. Is there any way to make it happen (built last) ?

bep · September 22, 2020, 10:12am

Output formats gets rendered in the order they are defined, so if you … say define a JSON output format and define that to run last, that should (in theory) work.

ju52 · September 22, 2020, 10:17am

one way to try

use render-link hook to collect the links with the .Scratch help

@bep can we get a list of links under the .Page object?

bep · September 22, 2020, 10:58am

In isolation, I would say no. But I have discussed a “page map” feature where you could navigate down the tree and render what you want, but that is a bigger feature … And I’m short for time as it is.

hlegendre · September 23, 2020, 9:41am

I managed to make something work with your idea @bep : I now have a generated file internal links (i.e. the graph edges and nodes).

Is there a way to exploit it again in the same pages, i.e. by rerendering the pages ? Or do I really need to do a build + copy of the generated file to /data/ + rebuild sourcing the data file ?

I would prefer to be able to keep the live reload…

bep · September 23, 2020, 11:22am

So, in my last advice I told you to put JSON last. if you put it first and do:

.Content on all pages in the JSON template to make sure the render hooks gets triggered and links collected.
Use the result of 1 in your HTML templates.

hlegendre · September 24, 2020, 10:52am

I could not make it work that way (probably because the backlinks export is in the section _index whereas the backlinks crawling is in the individual posts and I could not manage to get them first)…

In the end I did it in two passes : build + copy to data/ + build.

Thanks for your good help !

system · September 26, 2020, 10:52am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extract external links from a page support	2	434	February 27, 2022
Curated newsfeed page support	5	731	December 11, 2021
Getting all anchor links in rendered page (including header->footer) support	4	440	April 25, 2021
Internal Links & External URLs not working support obsidian	8	857	June 5, 2023
URL's within JSON generated content support	5	515	April 11, 2019

Best way to extract link graph

Related topics