Best way to extract link graph

Hi there,

I am trying to implement internal back-linking as I have seen it is not supported. To do so, I plan on building a JSON containing the graph of all links in the data/ folder and using it to extract reverse links (ie. on page1, listing all the pages that mention it). If you have a better idea, please share ! :slight_smile:

Therefore I need a robust way to explore all my pages and extract the internal links mentioned (as relref, ref or []()) and generate the JSON. Maybe through a section custom format ? Ideally, I would need the permalink as well as the title.

Do you have any idea how to implement that ?

Thanks !

you can create a custom JSON feed or a custom sitemap.

you can use my sample as starting point

so you get a 2 step implementation, feeding the results back to content.

1st create a new configDir to create only the sitemaps, and feed it back to content
specify it in the command line hugo --config createmap.toml
2nd create your site

Hi @ju52,

thanks for the input.
I see how I can create a custom JSON feed or sitemap (I have done a couple for my website already) but I fail to see how your example helps me with extracting internal links from each pages : i.e. for page1, if it mentions as internal link page2.md and page3.md (either as relref or other mean), I want to be able to have this information in my output to generate a graph of all the internal navigation provided by the content. This implies probably parsing each markdown except if there is a field on .Page with provides this information.
Can you point me in the right direction please ?

If you limit the scope to Markdown links, you may have success with collecting them in a render hook and store them in site.Home.Scratch or something.

ok @bep, I’ll try something along this path. Thank you.

I would need my custom page (that renders this Scratch) to be built last in order to be complete though. Is there any way to make it happen (built last) ?

Output formats gets rendered in the order they are defined, so if you … say define a JSON output format and define that to run last, that should (in theory) work.

one way to try

use render-link hook to collect the links with the .Scratch help

@bep can we get a list of links under the .Page object?

In isolation, I would say no. But I have discussed a “page map” feature where you could navigate down the tree and render what you want, but that is a bigger feature … And I’m short for time as it is.

1 Like

I managed to make something work with your idea @bep : I now have a generated file internal links (i.e. the graph edges and nodes).

Is there a way to exploit it again in the same pages, i.e. by rerendering the pages ? Or do I really need to do a build + copy of the generated file to /data/ + rebuild sourcing the data file ?

I would prefer to be able to keep the live reload…

So, in my last advice I told you to put JSON last. if you put it first and do:

  1. .Content on all pages in the JSON template to make sure the render hooks gets triggered and links collected.
  2. Use the result of 1 in your HTML templates.

I could not make it work that way (probably because the backlinks export is in the section _index whereas the backlinks crawling is in the individual posts and I could not manage to get them first)…

In the end I did it in two passes : build + copy to data/ + build.

Thanks for your good help !

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.