Handling of unknown file extensions?

pgr · September 9, 2018, 4:14pm

Hi.

I noticed my site is generating a bunch of amp directories, which aren’t intended.

I went to check things and I found out these are the culprits:

Opportunities.es.adoc.save
Metadata.es.adoc---         
Table of Contents.adoc.NOT

So it’s basically left-over files, which should just be deleted, but for some reason (intentional or not) were left behind.

These get rendered in

./public/amp/user/core-modules/Opportunities.es.adoc.save
./public/amp/developer/Metadata.es.adoc---
./public/amp/developer/Table of Contents.adoc.NOT

Two questions:

What’s the logic behind assuming these are amp format files? They aren’t.
Apart from the ignoreFiles entry in config.toml which let’s us disable a specific list of file extensions, is there a way of asking Hugo to ignore all unknown extensions? That would be my preferred handling of these cases.

Link to my source:

Thanks!

alexandros · September 9, 2018, 4:23pm

Please do read the Requesting Help Topic.

It’s the big pink banner at the top of the Forum.

After you read that topic and the Hugo Docs (you may want to look into Output Formats to disable AMP) feel free to open another topic but this time with a more concrete question along with a link to the source code of your Hugo project.

I’m closing this thread because the questions posted are too broad.

Thank you.

alexandros · September 9, 2018, 4:37pm

Ok. Since you edited your original post to include a link to your repo I re-opened this topic.

However I still think that your questions are too broad.

In any case maybe someone will look into your repo, but as I said above make sure to read the Documentation about Output Formats, it will show you how to disable AMP (since you don’t want it).

pgr · September 9, 2018, 4:37pm

I just did a second edit to improve the questions. Thanks!

pgr · September 9, 2018, 4:38pm

Note that I am not asking about disabling the amp format, I am asking why these files are getting recognized as amp format when they aren’t. I don’t think I will find an answer to that in the Documentation. I also read previous posts here about this (there isn’t much).

alexandros · September 9, 2018, 4:40pm

An ADOC file is an AsciiDoc file.

It is not an unknown file format. It seems that the theme you use is configured to output these files.

However I need to go now and can’t look into this further.

Thanks.

pgr · September 9, 2018, 4:51pm

I don’t mention any .adoc file in my post. I mention files ending in .NOT, .save, ---.

My config-toml has this (not done intentionally by me, I got it from the theme):

[outputs]
home = [ "HTML", "AMP", "RSS", "JSON"]
section = [ "HTML", "AMP", "RSS", "JSON"]

So I’m guessing the logic is something like this: “can’t render this file as HTML, let’s fall back to AMP”.

For the normal .adoc files that do render properly, no amp output is generated.

But does this logic make any sense? Maybe it does and it’s just me, I am not seeing it.

alexandros · September 9, 2018, 6:44pm

Well the following do seem like .adoc temp files to me.

Also all content files in the Repo you linked to are AsciiDoc files.

No that is not how Output Formats work. They exist side by side.

Anyway I am afraid that I cannot be of any further assistance.

Hugo does support Additional Formats Through External Helpers like Asii Doctor but the scope of this Forum is not about providing support for these external helpers.

I recommend that you raise your questions directly to the Hugo Learn Theme repository.

pgr · September 9, 2018, 7:14pm

I must respectfully disagree with you here.

Where front-matter omits Handler definition, Hugo is guessing the file type by the extension:

github.com

gohugoio/hugo/blob/master/hugolib/page.go#L1509-L1514


	// Try markup explicitly set in the frontmatter
	p.Markup = helpers.GuessType(p.Markup)
	if p.Markup == "unknown" {
		// Fall back to file extension (might also return "unknown")
		p.Markup = helpers.GuessType(p.Source.Ext())
	}

Now, the extension is being grabbed with the Ext function, which as you can confirm here will only get the last extension on names with two dots (press “Run” to see the results):

https://golang.org/pkg/path/filepath/#example_Ext

So for the three files I mention, adoc is not returned by Ext, and GuessType will return unknown for the helper:

github.com

gohugoio/hugo/blob/master/helpers/general.go#L75-L94


func GuessType(in string) string {
	switch strings.ToLower(in) {
	case "md", "markdown", "mdown":
		return "markdown"
	case "asciidoc", "adoc", "ad":
		return "asciidoc"
	case "mmark":
		return "mmark"
	case "rst":
		return "rst"
	case "pandoc", "pdc":
		return "pandoc"
	case "html", "htm":
		return "html"
	case "org":
		return "org"
	}


	return "unknown"
}

(it doesn’t matter that I have tons of valid .adoc files elsewhere in my site, this post is not about those, which are handled just fine).

So, we’re back to my Topic title which is asking about “Handling of unknown file extensions”.

I would like to understand better how this connects to Output formats, since the behavior I’m seeing now doesn’t make sense to me. Note that my main concern here is for Hugo’s consistency; I can just delete these old files and the problem goes away for me. However, if I’m uncovering something fishy here then it’s a good idea to improve Hugo in this regard. Or we could improve the Documentation.

Thanks.

alexandros · September 9, 2018, 7:43pm

First of all I want to point out that I am not a Go Developer.

Second it seems that an Unknown Output Format can be published by design. For example in lines 181-191 of outputFormat_test.go there is a test for an Unknown Output Format:

github.com

gohugoio/hugo/blob/ebb56e8bdbfaf4f955326017e40b2805850871e9/output/outputFormat_test.go#L181-L191


			"Add format unknown mediatype",
			[]map[string]interface{}{
				{
					"MYINVALID": map[string]interface{}{
						"baseName":  "mymy",
						"mediaType": "application/hugo",
					}}},
			true,
			func(t *testing.T, name string, f Formats) {


			}},

In the same file there are other tests for HTML, XML, JSON and AMP.

Why is the Unknown Output Format assumed to be AMP in your case?
I am afraid that I cannot answer your question.

Maybe you should open a bug report on GitHub.

But to be honest this is the first time I’m seeing this type of report in the Forum and I’ve been quite involved here for a while.

Why don’t you simply remove those temp files? Also you can use gitignore to exclude those files.

We all like to optimize our sites, why should the content folder be an exception to this rule?

Also on a friendly note if you do open the GitHub issue please tone down the language a bit, phrases like: “does this logic make any sense?” and “uncovering something fishy” are a bit counter productive.

Thanks!

bep · September 9, 2018, 7:53pm

No…

pgr · September 10, 2018, 9:21am

@alexandros thanks for your good advice, always welcome.

I do moderation myself for a big Community and I recognize you’re doing a good job here, you are always polite even when in disagreement. Keep up the good work (and your excellent drawings!).

I also do a lot of Support and bug reporting work for a very large open-source project and let me tell you my hesitation in taking this to Github… I feel like I need some Forum discussion before opening a bug, because I don’t understand (yet) what the normal behavior is supposed to be.

I need to understand what I’m telling Hugo to do when I put this in my config.toml:

[outputs] 
home = [ "HTML", "AMP", "RSS", "JSON"] 
section = [ "HTML", "AMP", "RSS", "JSON"]

Am I telling Hugo to generate all these outputs for every file (unless I override in the front-matter)? Then why am I not getting any AMP generated for my normal .adoc files?

But if Hugo is doing things right, and AMP is not generated because I would need to add some extra stuff to get that, then why is it being generated for unknown file extensions?

Once I have this answer I feel I can be much more specific with my (eventual) bug report.

Note that some of my bogus files are generated “by accident”, without users being aware of it, like the .save file which probably comes from a text-editor’s auto-save. So people might be thinking they have removed some content from the site, without realizing the .save file is there. They won’t find it as a normal file when they browse the site, but it will be there under some amp directory… this sounds like the sort of behavior that could be improved.

alexandros · September 10, 2018, 9:38am

Thanks for the comment. I do try.

BTW I also noticed that you are the author of the theme you use.

Well @bep has already replied in this topic. He is probably the only one who can answer your questions provided he has the time.

pgr · September 19, 2018, 8:59am

I’m going to drop this since there seems to be no interest/time to pursue, I will just delete my left-over files.

If one day someone thinks this is worth creating a GitHub issue, let me know, I don’t mind taking care of it.

Topic		Replies	Views
Custom Output Formats: Feedback Wanted! feature	18	3643	April 28, 2017
Is Hugo support AMP ( Accelerated Mobile Pages )? support	33	14827	May 22, 2017
Creating AMP Equivalents support	5	1146	September 15, 2017
Output format questions support	2	1088	April 20, 2017
HTML File and Remove Link Extension support	12	2204	December 5, 2017

Handling of unknown file extensions?

Related topics