getRemote: Handle cases when media type can't be determined from content

Currently, Hugo used the Go-native http.DetectContentType function to determine the media type of a remote resource when using resources.getRemote. However, sometimes the algorithm is unable to determine the exact content type and returns application/octet-stream instead. This leads to getRemote failing (media.FromContent returns zero in this case):

// from resource_factories/remote.go
...
		// Now resolve the media type primarily using the content.
		mediaType = media.FromContent(c.rs.MediaTypes, extensionHints, body)

	}

	if mediaType.IsZero() {
		return nil, fmt.Errorf("failed to resolve media type for remote resource %q", uri)
	}
...

This is problematic, because it limits the usefulness of getRemote. Could we potentially just use the content type in the header if we can’t determine the media type from the content? E.g., doing this:

	if mediaType.IsZero() {
		mediaType, _ = media.FromString(contentType)
	}

I am aware of this, but would appreciate a specific example.

Google Protobuf should be “application/x-protobuf”. The format is (among other things) used for vector map tiles. Admittedly, it’s a bit obscure but shouldn’t getRemote be format agnostic?

Example URL?

Here you go: https://api.maptiler.com/tiles/v3-openmaptiles/0/0/0.pbf?key=143k20pEdcyOeBepEEnR

There are three approaches, either separate or in combination, to determine content type:

  • Sniff the content
  • Use the response header
  • Use the file extension (if present in the URL)

Which of these can you trust?

  • The payload can be altered to pass a sniff test
  • The response header can be wrong, either intentionally or unintentionally
  • The file extension can be wrong, either intentionally or unintentionally

The current implementation is, for a Very Good Reason, cautiously restrictive. And there have not been many (any?) reported cases where this behavior has been a show-stopper.

Perhaps we could add a content-type key to the security policy in site configuration. Something like:

[security.http]
methods = ['(?i)GET|POST']
urls = ['.*']
contentTypes = ['^application/x-protobuf$','^image/jpeg$']

With this configuration, if the content type in the response header is in the array of contentTypes, get the file instead of throwing an error.

I’m criticising the current approach, and being cautious is definitely not a bad thing. However, I feel like there should be a way to override this behavior. Maybe just a flag for getRemote to ignore the content would be enough?

I think this has been discussed before …

I think the conclusion then, which I agree with, is that we could add a mediaType option to resources.GetRemote. Template authors are trusted to do good things, so this should be safe. If someone points me to a GitHub issue I will make it into the next Hugo (scheduled for middle of next week).

This is resolved in the next release, presumably v0.112.0, with:
https://github.com/gohugoio/hugo/pull/10973

site configuration

[security.http]
mediaTypes = ['^application/x-protobuf$']

Thank you bep.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.