With Blackfriday officially deprecated, I’m looking to convert my 4200-entry blog over to Goldmark, but the “typographer” extension is still terrible. Post-processing the site with one of the standard “smartypants” scripts takes at least twice as long as the Hugo run, so my current plan is to pre-smarten all of the Markdown source files.
I’ve hacked together a script that mostly works, but I was wondering if anyone else has come up with a more robust, reliable solution than this:
#!/usr/bin/env bash
#
# add smart quotes to Hugo Markdown source files, using the
# reference implementation of CommonMark's CLI tool:
# https://github.com/commonmark/commonmark-spec
# Notes:
# - assumes TOML front matter
# - converts footnote-style links to inline
# - normalizes ordered/unordered list formatting
#
# WARNING: possible site-breaking changes:
# ! rarely, cmark breaks *italic* and **bold** by backslashing
# the asterisks
# ! breaks description/definition-list formatting by reflowing it
# - adds blank line before shortcode that starts a line
# - adds blank line after shortcode that ends a line
# - adds usually-gratuitous backslashes to [, ], !, etc.
# - converts , ’, ​, etc into Unicode literals
# - probably won't handle a "+++" line in body content
CMARK="cmark --to commonmark --width 70 --smart --unsafe"
for file in "$@"; do
cat "$file" |
# convert front matter to HTML comment, so it all gets ignored
sed -e '1 s/^\+\+\+$/<!-- _FMPLUS_/' \
-e 's/^\+\+\+$/_FMPLUS_ -->/' |
# convert shortcodes to HTML comments, to keep it from
# escaping their arguments
sed -e 's/{{</<!-- _SC1OPEN_/g' \
-e 's/>}}/_SC1CLOSE -->/g' \
-e 's/{{%/<!-- _SC2OPEN_/g' \
-e 's/%}}/_SC2CLOSE -->/g' |
# pass through commonmark
$CMARK |
# restore shortcodes
sed -e 's/<!-- _SC1OPEN_/{{</g' \
-e 's/_SC1CLOSE -->/>}}/g' \
-e 's/<!-- _SC2OPEN_/{{%/g' \
-e 's/_SC2CLOSE -->/%}}/g' |
# restore front matter
sed -e 's/^.*_FMPLUS_.*$/+++/' > "$file.new"
# overwrite original (you have source control, right?)
mv "$file.new" "$file"
done
exit 0
This script takes about 35 seconds to run on my laptop, and as per my comments, there are some issues I’ll have to correct by hand. Not so bad when you have a few dozen blog posts, but the diff for my site runs to 184,000 lines!
-j