Link Checker for Go/Hugo?


#1

This isn’t so much a feature request (seems a little out of place for something like this to be built into Hugo), but does anyone have any suggestions for link checking when creating large, static sites with Hugo? I have chrome extensions for individual pages, and there are obviously services that can take care of this, but paying for anything seems like a waste of $$. Thanks!


#2

Wget can act as a link checker. See here.


#3

Golang does have this:

https://golang.org/misc/linkcheck/linkcheck.go

It’s used on the docs, apparently. I guess something like this could be incorporated by someone who can code in golang and do a PR for adding it.

But I think @moorereason’s suggestion is easy to incorporate into a deploy or pre-deploy script.


#4

There was an issue about that on Github:

But I agree, that checking every link during the generation would slow down Hugo a lot.


#5

@RickCogley This is great. Unfortunately, I haven’t gotten it to work yet, haha. Awesome stuff.

I agree with everyone that it wouldn’t be part of every build, but maybe something to the effect of running hugo server --linkcheck?

The script that @RickCogley provided here takes http://localhost:6060 as an example argument, so it would be great to have it set to pull from the same localhost port being used by whichever dev at that time. I will continue to work on this, but @spf13 had mentioned compiling a list of features that the community would want before a 1.0 release. IMHO, this would be a worthwhile feature for any site more than a couple hundred pages. Dead links can be a serious PITA for content owners and editors.


#6

It would make sense that external links are kept out of scope simply because (a) external links can respond very slowly and (b) there are numerous reasons why an external link can be (or seem to be) broken.

But I think it would be a good idea if Hugo check for internal broken links; that would make it easier to catch errors in the content and themes.


#7

@digitalcraftsman @Jura I took @RickCogley advice and checked out linkcheck.go. I would love to learn what I’m doing wrong (@bep @moorereason @spf13 @DerekPerkins, etc ) and will also post this to the Gittter channel momentarily.

My linkcheck.go runs, but it doesn’t return any errors after I’ve intentionally created bad links (eg, [my bad link](/lkijdlkjsfd/)) in a content file. I run hugo server in terminal tab and then go run linkcheck.go in another. As I’m not a Go Developer, I’m missing something w/r/t the tweaks I’ve made on line 24 of the original scripts:

//original linkcheck.go
var (
    root      = flag.String("root", "http://localhost:6060", "Root to crawl")
    verbose   = flag.Bool("verbose", false, "verbose")
    )
//my attempt https://github.com/rdwatters/hugo-starter/blob/master/linkcheck.go 
var (
    root     = flag.String("root", "http://localhost:1313", "")
    verbose  = flag.Bool("verbose", false, "verbose")
)

Any tips out there? If it’s not something worth incorporating into Hugo itself and instead can be run easily locally as a utility (makes sense, since I don’t think you need to check for broken links with every content update), I think the real power in a script like this would be the ability to check for both internal links and external links locally whenever needed/scheduled.


#8

I case people find this in the future: I’ve released a tool for testing generated HTML–for broken links among other things–written in go. See https://github.com/wjdp/htmltest.


#9

Very cool @wjdp. Link checking can become such a hassle after you hit even a few dozen pages, and a lot of Hugo sites are hundreds and thousands of pages in size. I would love a link checker to be part of Hugo core; not necessarily something that would be run with every build—this would really slow down local development—but definitely something that could be run with a flag like --ignoreCache. …


#10

Cool tool for Windows http://home.snafu.de/tilman/xenulink.html


#11

I’ve run that tool in Linux using Wine before. Worked well as far as I can remember.


#12

Considering its age and not too modern UI, Xenu is still IMHO the best link checker out there. +1


#13

Just a note Xenu’s tool looks good for manual testing. htmltest is designed to be put into a CI/CD pipeline–for example a site being built in Travis or similar. I built it because the current standard (html-proofer in ruby) has some long run times on large sites, Go helps a lot with this. I use it now in the pipeline of a site with +2000 pages. In ~10 seconds we check all internal links are valid, scripts are loading &c. External links are checked periodically and results stored in cache shared between builds.

Regarding Windows, I’ve currently got OSX and Linux builds but nothing for Windows. Anyone who fancies giving this ago drop me a line here: https://github.com/wjdp/htmltest/issues/18


#14

Sounds pretty awesome, will check it out!


#15

Sounds great, do you have a guide on how to set it up for a Hugo site?

Xenu fans who now prefer Macs may want to check out Screaming Frog, free up to 500 links. No affiliation.


#16

Yeah, so for a hugo site you should be able to use with minimal configuration.

You can just run htmltest public to run over your built source, but if you want to configure the app futher you’ll need a config file.

A .htmltest.yml in the root of your project with the following will get you going.

DirectoryPath: public

I’ve started a wiki page on the GH repo here: https://github.com/wjdp/htmltest/wiki/Using-With-Hugo. If you find anything that would be useful for others to know please either comment here or add on there.


#17

Thanks mate, will have a look!


#18

Why not use the chrome extension?


#19

Do you know of any extensions that checks links on other pages than the current?

Went full in with Screaming Frog. It also show missing meta descriptions etc.


#20

No, got you wrong. I thought you want to check on visible page.