Hugo --filter script.sh ? running the markdown files through a filter before processing

laszewsk · August 16, 2020, 4:28am

I am in the need to have all my markdown files modified so they before running through hugo runthrough a script so that bibtex references are added, I looked, but I can not figure out if there is a wahy to run a “script” on the md file can be passed from the commandline which would solve this issue. Ig my script simply runs pandoc with siteproc on each md file and than hugo gets run on the file produced internaly.

So the scipt I like to pass is the one after the cat a.md pipe

cat a.md | pandoc -t commonmark -s --bibliography a.bib --csl ieee.csl

is there a command or a feature in the configuration file of hugo that would allow this"

such as

hugo --filter=“pandoc -t commonmark -s --bibliography a.bib --csl ieee.csl”

I could imagine that this exists as this is super useful, but I could not find it, possibly I look at the wrong place. Naturally the best way would be to have bibtex support build in such as

hugo --bib=a.bib
Please note that I like to avoid the logic to recursively go through the dir, run the markdown on all files and copy the dire into anew directory and than use that new directory for hugo’s input. Having a filter would avoid this strategy

Thanks
Gregor
laszewski@gmail.com

ju52 · August 16, 2020, 7:51am

my 2c for this, there is nothing what you want

Generate the bibliography content in additional files and use hugo’s includes

laszewsk · August 16, 2020, 1:01pm

I have about 300 pages that need to be modified. Doing that by hand seems complex. Do you have the template that I need to include or an example?

ju52 · August 16, 2020, 2:10pm

I tryed first

using page bundles - created sub-dir ./bib per page.
single.html contains

{{range .Resources.Match "bib/*" -}}
	{{ .Content }}
{{end}}

get all files from this directory!

ex: bib/1.md

**Hallo World**

renders as

Hallo World

this uses page resources to get done.

laszewsk · August 16, 2020, 2:50pm

Thank you. Bt what I do not understand is the integartion of the features of pandoc into this. E.g. I have over 10000 citations in a single file, now the issue is HOw do I get it rendered into the page:

a) where is such a template?
b) where is an example on how to do this?

Please remembr that pandoc does this automatically for me in markdown without me having to write any template nor any software to parse anything.

Would it not be useful to have some plugin that is easy to configure in hugo so that thsi is made more easily available to the users?

Maybe I am still looking at the wrong thing. For example it seems that pandoc can be used as rendered, but I can not find a concrete example on how to do this in the config.toml. So maybe all that is needed is a better documentation on how to use pandoc as renderer and explaining how I pass the options to pandoc from within the config file?

ju52 · August 16, 2020, 3:34pm

hugo can work with pandoc …

I never used pandoc - maybe other people can help here

laszewsk · September 23, 2020, 11:19pm

I solved my own problem I developed a script that includes bibtex, and puts the proper option into pandoc. I place the bibtex in a specific directory to make it easy.

The sample is at https://cybertraining-dsc.github.io/modules/sample/pandoc/

The script is at

github.com

cybertraining-dsc/cybertraining-dsc.github.io/blob/master/pandoc

#! /bin/sh

# BIB=`pwd`/bib/all.bib

cat - | /usr/local/bin/pandoc -F pandoc-crossref -F pandoc-citeproc -f markdown -t html -N --metadata link-citations=true --bibliography=bib/all.bib --csl=bib/ieee-with-url.csl

IN my .bashrc./.bashprofile/.zprofile (varies by machine) I simply put “.” in the PATH

export PATH=./$PATH

then in the pages that are supposed to be using pandoc I use

markup: pandoc

This will than switch the markdown processor to pandoc and everything works.
It would be nice if someone does a better integration of this in Hugo so that the community does not have to reinvent this. I have seen many posts that aks for this, but I could not locate a documented solution.

I hope that my contribution helps someone. My plan is to provide a new hugo theme that includes this feature.

Gregor

PS: naturally you must have pandoc, pandoc-crossref, and pandoc-citeproc installed. In the same fashion you can add other filters to pandoc

swamidass · January 16, 2021, 5:27pm

This is helpful. It seems, though, that you should put the script in a bin directory of the project, and set up the environment to put that in the search path. This is a bit hackish though. The problem is that different post types might need different filters, and so on.

More fundamentally, it is not good practice to create a wrapper of a program, with the same name, that has different invocations than the original program.

@bep it would be ideal if this could be handled directly by the hugo config system, as options to the pandoc markup.

The easiest way would be to add a pandoc-options config variable that allows us to set the options for the pandoc run. That would give the power we need without requiring a wrapper to pandoc named “pandoc”.
An alternate approach would be to enable an option that allows us to specify the executable or script for pandoc, so we could specify “bin/my-pandoc-wrapper.” This would give a way of offloading much of the config to other files, and enabling more flexibility.
You could try to “hugo-encode” all the options in pandoc, but this seems to be a lot of work.

I think that #1 and #2 both should be straightforward to implement, and would add an immense amount of flexibility and power to the pandoc integration.

What do you think? Is that possible?

laszewsk · January 16, 2021, 5:51pm

Yes, i know my solution is a bit of a hack, but the only way I could figure out how to make it work …

The proper solution should be that in hugo we have some configuration file that determines

a) the location of pandoc, as it could be that you have multiple different versions installed due to compatibility issues of pandoc itself

b) allow in the configuration the passing of arguments to pandoc so it can easily be configured. When I tried it (I have not tried it previously) that was not possible and it was just hardcoded in hugo

c) Have a defined way so that in the page we can specify inclusion of the bib file itself (e.g. multiple bib files)

d) have a mode that we can use that allows us to include all references just like cite{*} in LaTeX or the utilization of only the labels we use (that should be default)

The issue is I do not have time to implement this …

Its quite disappointing that we need to resort to such hacks. bibtex support should be a first class feature in hugo.

laszewsk · January 16, 2021, 6:10pm

So the real issue is this:

Content Formats | Hugo

It states that pandoc has reasonable default parameters --mathjax which is as far as I could see form the code was hardcoded. Also if you look at that page examples for configuration are specified for many converters but not for pandoc. Somone that works on hugo regularly, should expand upon this and make this possible also for pandoc …

As it was hardcoded when I looked at it, the only way was to overwrite pandoc with a shell script.

laszewsk · January 16, 2021, 6:15pm

The hugo code is at

So if somone knows how to pass along the options from a configuration file, this should be all possible. … As you can see mathjax is insufficient

swamidass · January 16, 2021, 7:28pm

I don’t code in Go, and i’m just learning the basics of Hugo, but this comparison point of how config options are made use of with GoldMark seems helpful:

github.com

gohugoio/hugo/blob/master/markup/goldmark/convert.go

// Copyright 2019 The Hugo Authors. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

// Package goldmark converts Markdown to HTML using Goldmark.
package goldmark

import (
	"bytes"
	"fmt"
	"math/bits"

This file has been truncated. show original

It seems like it should be fairly straightforward for one of the core dev team to add in two pandoc options:

executable
options

Of note, the executable option wouldn’t be tied to pandoc really. So it might actually work to make any parser engine work as the parser, especially if additional config can passed as an argument (perhaps as JSON?). That might be valuable, in that it could allow inheritance of config properties…

But just doing this for pandoc would be quite powerful and helpful.

swamidass · January 16, 2021, 8:02pm

So the other possibility that does not require more developement, and is not hacky:

Create a directory that houses all your source pandoc files. Use a script the processes these files, using pandoc or what ever you want, with an output format of “markdown”, and output into the content directory.

In that way, you are basically using pandoc as a markdown pre-processor to feed files to Hugo. You can pass any options you like to it, apply any pandoc filters you like, and refer to any bib files you like.

Though an improvement to hugo would be preferable, this setup seems like it would work quite well, without aliasing pandoc to a wrapper.

jzeneto · July 10, 2021, 10:40pm

Another option is simply using YAML front-matter to pass any Pandoc options you need. Since (nearly) all Pandoc options now are accessible from front-matter, you can perfectly configure all you need from there.

And if you need to put those configuration in many files, perhaps you can use Front matter cascade option from Hugo: put those files in a section that, in its _index.md, sets all the common Pandoc options below the cascade option (although I don’t know if Hugo interprets cascade when using an external parser).

See, I didn’t tested this, but perhaps it works.

Topic		Replies	Views
Config Options for Pandoc dev	1	1606	January 16, 2021
How can I manage a publication list in Hugo, preferrably from Bibtex file? support	10	11971	July 22, 2018
markdown with mathjax in hugo and pandoc support typesetting	7	2043	December 27, 2020
Bibtex support feature	5	5193	May 17, 2019
Render some files using pandoc support	3	723	January 12, 2023

Hugo --filter script.sh ? running the markdown files through a filter before processing

Related topics