Improved org-mode support

I wrote a little org-mode parser for my blog [1] and am wondering whether it could potentially be added to hugo proper. I guess replacing goorgeous would be a breaking change for existing users and that sounds like a big problem already…

Anyways, you can find it here: go-org.
There’s rendering examples & an online demo (thx to go 1.11 wasm support :heart:) on github pages.

Various open issues from goorgeous are fixed (afaict) as can been under misc on github pages [2].

To test hugo with the new renderer just replace orgRender with the following:


func orgRender(ctx *RenderingContext, c ContentSpec) []byte {
	writer := org.NewHTMLWriter()
	writer.HighlightCodeBlock = func(source, lang string) string {
		out, err := c.Highlight(source, lang, "")
		if err != nil {
			jww.ERROR.Printf("Could not highlight source as lang %s. Using raw source.", lang)
		}
		return out
	}
	in := bytes.NewReader(ctx.Content)
	out, err := org.NewDocument().SetPath(ctx.DocumentName).Parse(in).Write(writer)
	if err != nil {
		jww.ERROR.Printf("Could not render org: %s. Using unrendered content.", err)
		out = string(ctx.Content)
	}
	return []byte(out)
}

Happy to hear your thoughts :).

[1] I wanted to use hugo, but wasn’t happy with the existing org-mode support via goorgeous (which seems to have been abandoned).
[2] Cannot link it as I’m limited to 2 links per post as a new user

1 Like

How “breaking” would you say this would be if we replaced goorgeous with your implementation?

Sorry, I can’t tell at all - I just assume that there are some. One of the reasons I started a new parser rather than forking it is that I find goorgeous to be hard to understand. Comparing the html output also sounds infeasible as I’m not replicating blackfriday output byte for byte and visual comparison is something I’ve shied away from until now…

I’ll look into setting up some tests for this - apart from that I’m hoping a few other org-mode users see this thread and can give me feedback on how close to goorgeous / valid the rendering is for them.

How does hugo normally go about replacing features and ensuring compatibility? I’d greatly appreciate any input :).

The thing is, if Hugo depends on a library that for some reason gets unmaintained and that gets “problematic” in some way, we have some options:

  1. We could take over the maintainance of that project
  2. Remove it from Hugo

Since “we” in the above would probably mean “me”, that is not an option in this case. And if we remove it, we could replace it with something similar. Or just leave it as is.

Also note that I have had on my task list for some time now doing something about the “blackriday v2” situation – and I have some amabitoins to make the “content rendering” situation less fragmented than it currently is, so the timing isn’t perfect.

1 Like

Thanks for the input :)!

In that case I’ll just keep using my local branch for now and wait for blackfriday v2.

Nonetheless, a small update regarding the breaking changes thing:
I added the differences I found to the README and uploaded a basic rendering comparison to github pages.
Please note that the comparison is not a fair visual comparison (as I did not adapt the default stylesheet to fit goorgeous), only the raw html should be compared.

Going through the list I saw on GitHub - niklasfasching/go-org: Org mode parser with html & pretty printed org rendering. also shitty static site generator., I didn’t see anything as seriously breaking:

no headline ids

It would be nice to support this at some point… At least convert user-specified CUSTOM_ID drawer property to an id attribute. And if that is not specified, then auto-derive an id from the heading (@niklasfasching I do that in ox-hugo using this function).

not changing links to .org files into links to .html files - this has edge cases like mangling links to e.g. example.org. could be supported for file: links though i guess

This is a difficult territory. Even if you convert .org links to .html links, they cannot be guaranteed to work, because the user could have set the url, slug, etc front-matter differently.

org comments not rendered as html comments (same as ox-html.el)

This is nice! I doubt anyone used Org comments as HTML comments.

headline priority not exported to html (same as ox-html.el)

This should be easy to support though… just export the priority as span tags hidden by default.

goorgeous treats all file: links as images - go-org checks for an image file extension (same as ox-html.el)

I see this as a correction/improvement.

no support for [@10] in ordered lists Ordered lists · Issue #18 · curiouslychase/goorgeous · GitHub

This is not a biggie.

Awesome job! But … little? Looking at https://niklasfasching.github.io/go-org/, you have done an awesome job, and supports a lot more Org syntax than Goorgeous does.

If you like to support/test more Org syntax, may I suggest my monolith Org test file for ox-hugo :slightly_smiling_face:.


As an aside, a minor nitpick … on reading https://niklasfasching.de/post/go-org/, I had a comment about your use of “org-mode” in prose. You might already know this, but the convention is to use “Org mode”. The “org-mode” is more of a programmatic reference in the Elisp code. I have written a short note about this convention.

1 Like

First of all, thanks for all the comments regarding compatibility with goorgeous - added to the todo list :slight_smile:

You’re right, little is what I’d like it to be but that point seems to have passed :see_no_evil: - I just ran cloc (2k+ LOC go)… Not anymore :D. Thanks!

Regarding the org-mode “Org mode” distinction - TIL, thx - will fix that.

1 Like

I don’t use Org-Mode myself, but a general remark about compability would be: If a switch to go-org would be considered an overall significant improvement, I think most people would be happy to give up some not so important stuff. I think this is especially true for the Emacs people.

I have quickly scanned the code, and it looks very good, so if @kaushalmodi says that the functionality is good to go, I would welcome a switch.

2 Likes

Looking at the examples, go-org has done a really good job of parsing the Org syntax, and I am hoping that @niklasfasching will continue on improving that.

Regarding goorgeous/go-org compatibility, I missed out seeing the differences in front-matter parsing. Will go-org parse the front-matter from Org keywords (#+title: foo bar) just as Goorgeous did? If not, what would be the differences?

Also talking about breaking changes, it would be nice if the plain Org comment # more used by Gooorgeous as a marker to insert <!--more--> is replaced by a more Org-like syntax #+hugo: more (and that change be mentioned in release notes).

@niklasfasching Thoughts?

@bep thx, that’s nice to hear!

Haven’t looked into that much, will do and report back. Same for the #+hugo: more change.

While trying to parse ox-hugo all-posts.org I found like 3 bugs already and am sure there are more waiting to be found.
I’d like to take a few days to investigate that first - maybe there’s something that won’t fit in the current architecture and cannot be fixed easily. I also saw that ox-hugo has a list of blogs using it real world examples and will check go-org against a few of those for problems.

TLDR: Will get back to you in a few days

Thanks for all your input and all those new test cases :)!

3 Likes

Things are looking well enough, we can go ahead now (from my side at least)

As far as I understand it hugo does not care about the case of front matter keys - goorgeous used the actual case used in the file (#+Title stays Title) and go-org lowercases everything (#+Title becomes title) - hugo shouldn’t care either way.

Apart from that goorgeous hardcodes which keyword values become arrays rather than plain strings (tags, categories, aliases). go-org does the same for now but we could move that into a keyword like HUGO_ARRIFY_KEYWORD_VALUE (better naming ideas very very welcome :D) so the user can decide themselves here.

I’m still trying to wrap my head around goorgeous front matter parsing and will open a PR for goorgeous soon - there’s a few other things that will require discussion (e.g. I’d like go-org to receive the complete file contents, not just the main content with the front matter removed) but we can do that then and there.

This change would be hugo internal and can be done independent of go-org - but yeah, maybe we combine both.

Any further thoughts @bep @kaushalmodi?

The hardcoded approach sounds very limiting. I don’t know Org-mode syntax, but don’t you have a “array syntax”?

There is. But then @niklasfasching would need to embed a lisp parser too.

Example syntax:

#+animals: '(dog cat "penguin" "mountain gorilla") ;parse `dog` as `"dog"`, and so on
#+strings_symbols: '("abc" def "two words")
#+integers: '(123 -5 17 1_234)
#+floats: '(12.3 -5.0 -17E-6)
#+booleans: '(t nil)

From one of my tests,

** Custom front matter with list values                         :list_values:
:PROPERTIES:
:EXPORT_HUGO_CUSTOM_FRONT_MATTER: :animals '(dog cat "penguin" "mountain gorilla")
:EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :strings-symbols '("abc" def "two words")
:EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :integers '(123 -5 17 1_234)
:EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :floats '(12.3 -5.0 -17E-6)
:EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :booleans '(true false)
:END:
*** Custom front matter with list values in TOML
:PROPERTIES:
:EXPORT_FILE_NAME: custom-front-matter-with-list-values-toml
:EXPORT_HUGO_FRONT_MATTER_FORMAT: toml
:END:
{{{oxhugoissue(99)}}}

results in Custom front matter with list values in TOML ❚ ox-hugo Test Site.

PS: If it confuses someone as to why ints, floats, etc. all get parsed as string arrays/slices in Hugo, see https://discourse.gohugo.io/t/curious-why-all-list-params-in-page-are-casted-as-string-lists-internally-but-not-in-resources-params/10161.

You might end up building a mini lisp parser. Things get more complicated when you want to make in support for TOML table params or nested list params, like resources.


Alternatively, you may choose to not do any front-matter parsing just for the more complicated cases, and have the user do something like:

#+begin_src toml :front_matter_extra t
# List your qualifications (such as academic degrees).
[[education.courses]]
  course = "PhD in Artificial Intelligence"
  institution = "Stanford University"
  year = 2012

[[education.courses]]
  course = "MEng in Artificial Intelligence"
  institution = "Massachusetts Institute of Technology"
  year = 2009

[[education.courses]]
  course = "BSc in Artificial Intelligence"
  institution = "Massachusetts Institute of Technology"
  year = 2008
#+end_src

(Org mode doesn’t have inbuilt support for :front_matter_extra t … it’s just something extra I added to ox-hugo to support such cases.)

See this comment on one of ox-hugo issues for details.

Yes, I hope @bep is fine with this minor change in that “more” syntax. Fixing # more to #+hugo: more in the content files globally is pretty trivial.

That is not related to this particular task. I’m not a big fan of pulling in other issues that you want fixed into the mix. It is more likely to fix one issue if you keep the scope to that one issue … Making it 2,3…5 issues thinking that wil “get you more” is in most cases getting you nothing.

OK, I agree with that. Earlier we thought that the go-org would be responsible for parsing the “more” splitter.

I’m just being a bystander here and I pointed out an incorrectness — an Org comment # more shouldn’t be parsed as a special comment. I am not really affected by this getting fixed or not.

You’re right, the current approach is far from perfect - something should be done about it. As any (of the suggested) changes would alter current behavior (i.e. be breaking changes?) I would suggest we ask for more input from Org mode users first. I would also argue that this is, also, not part of the current task and should be done in a separate PR.

Same for this: I agree, now that we know it’s part of hugo it’s a separate PR.

Here’s the PR for go-org: link - i guess discussion will continue there.

Thanks again!