What exactly is SafeHTML doing?

I’m confused by the doc on SafeHTML. It seems to be conflating what Hugo does with how a browser renders the result.

It says that applying SafeHTML to this example:

copyright = "© 2015 Jane Doe. <a href=\"http://creativecommons.org/licenses/by/4.0/\">Some rights reserved</a>."

Hugo will output:

© 2015 Jane Doe. Some rights reserved.

But that can’t be what Hugo will output – there’s no HTML there. That’s how the output might be rendered by a browser, but the doc should tell us exactly what Hugo will output.

Relatedly, why are the double-quotes in the href URL backslash-escaped? As far as I know, Markdown does not require or support backslash-escaping double-quotes, and I don’t see anything about it in the golang html/template docs, or in the Hugo docs. Is this a TOML thing? If so, that’s something else that should be clarified in the doc. Would we need them for a YAML config?

The way the SafeHTML string is assumed to be rendered by browsers implies that SafeHTML actually does do some filtering – it would have to remove the backslashes from the href. Is this correct?

(p.s. There’s a small pattern of the Hugo docs conflating what Hugo does with what a browser, web server, or user does. For example, the docs on URLs, pretty/ugly URLs, etc. It implies that Hugo creates URLs, but Hugo doesn’t and can’t create URLs (except in the sitemap). It creates folders and files in those folders. URLs don’t exist in the sense the Hugo docs imply – the only thing that exists is the server’s behavior in response to a requested URL. I’ll write that up separately, with suggestions.)

func safeHTML(a interface{}) (template.HTML, error) {
    s, err := cast.ToStringE(a)
    return template.HTML(s), err
}

Is exactly what it does. What template.HTML does should be in Go’s doc here: https://golang.org/pkg/html/template/ - if it doesn’t, create an issue here: https://github.com/golang/go/issues/

I don’t follow, so I guess I’ll have to wait for your write up. Yes, your server responds to requests, but Hugo is a static site generator, meaning that the directory structure is the URL structure. The “pretty” (aka “clean” in Pelican) vs “ugly” nomenclature is pretty standard among SSGs at this point:

http://jekyllrb.com/docs/permalinks/
https://middlemanapp.com/advanced/pretty_urls
https://github.com/getpelican/pelican/blob/master/docs/settings.rst#url-settings
https://hexo.io/docs/configuration.html#URL

1 Like

Bjørn, what’s the intended behavior with the backslashes? I don’t think html/template does anything with them. The only mention I see is that “Package html/template does not support actions following a backslash.” I guess it must be in that cast.ToStringE method. I found it in caste.go but I don’t see anything in the code that would handle backslashes, unless that’s an implicit golang thing.

RD, I’m fine with the prevailing usage of pretty and ugly URLs.

The directory structure is the URL structure.

I agree that this is mostly true with the default settings of popular web servers on Linux/BSD. But it seems to be less true by the day, and I’m not sure if it’s quite as true with IIS and other platforms. Prescriptively, I think it’s an incredibly bad idea to expose one’s directory structure to the web. A URL is, or should be, just a string – a hopeful string – with no real connection to one’s file system or directory structure.

I think this becomes easier to achieve with in-memory key/value stores like Redis and memcached. I like Julien Schmidt’s golang httprouter, and I wonder about implementing it in the reverse direction, starting with the file extension like png or jpg. I think it might be performant to have all pngs in one folder, jpgs in another, etc. instead of images or something. And I’m guessing that httprouter assumes that URLs map to on-disk folder and file paths, but I wonder if it would be possible to adapt it to some kind of synthetic repository.

I think the Hugo docs would benefit from explaining that URLs mostly exist in the context of a request. People are coming to your site based on a URL in a link or by typing a URL in their browsers. Hugo isn’t going to do anything to circulate correct URLs to the world, except via the sitemap. That’s mostly web server settings, 301s, canonicals, etc. Hugo creates pretty URLs by creating a directory for every post and throwing an index.html in them, which web servers handle predictably, but what really makes those pretty URLs authoritative will be how missing trailing slashes are handled, permanent vs. non-permanent redirects, etc. – I’m not sure if those details are handled the same way by default in all popular web servers.

1 Like

Some times I have a feeling that what Hugo does the most is “circulating URLs to the world” … Getting the URL right in the different situations has proven to be extremely hard, so any mumbo jumbo about the URLs only exsting in the “context of a request” is bullshit.