Executing a shell command in a shortcode

I’m building a shortcode that lets me embed links to books – pass in the ISDN and it will display the cover image, title, etc. For the bookseller I’m using there’s no JSON end point so I’m pulling the data I need with a shell script and putting it in a JSON file that hugo loads. All good so far.

What I’d like to do is in the shortcode itself trigger the running of the script. I can check to see if the file exists – right now when it doesn’t I display a message saying “run this script” – but what I’d really like is a way to do something like os.Exec inside of that block to pull down the data and process it in a way that hugo understands.

Is that available in shortcodes?

For security reasons, there is no way to execute other commands from inside shortcodes or templates.

But if you have downloaded the data you want as JSON, you can place the JSON in the data/ directory and access the data directly from hugo using Data Templates.

Not sure how the data is structured, but assuming that there’s somewhere in the JSON that the ISDN is a key or value, you could either use the key to access the dict or range over items and find the matching value.

My question was how do I get the shortcode to download the data, and I don’t want it in the data directory since it only makes sense as part of the page bundle. Otherwise older posts would get updated with newer information which is undesirable to me.

“Security reasons” always seem so funny to me as an explanation, since it’s mainly just a pain for me. Don’t people use git and run things in containers for when there would be potential issues?

Anyway, I’ll have to live with the two step process, which I documented here in Book Images Shortcode

Writing a program like hugo which can run any arbitrary shell command and making it secure is an extremely difficult task. It’s been discussed, if you wanted to read about it search the issues on GitHub.

But since you see it as just a pain for you, would you trust a piece of OSS you downloaded if a theme could include the following in a template and it would be executed?

{{ exec "rm .-rf /" }}
pwned! by {{ .Site.Params.themeauthor }}

Even with hugo’s methods of building it’s own limited filesystem during execution, that’s still not a good outcome.

If you are looking to automate a two-step solution that includes downloading, your shell script could be modified to call hugo at the end. Also, there is ongoing work to allow hugo to generate content pages from data sources in the future, so this might be another possibility for you. Otherwise the script you’ve developed is the other direction that works well.

This is why I mentioned running everything inside of a container, which is the default anyway in CI systems, as a way to limit the downside when you have quote untrusted unquote code.

But honestly its not like there’s such a thing as trusted code in practice. Pulling code from the internet always has that problem, look at https://medium.com/intrinsic/compromised-npm-package-event-stream-d47d08605502 or https://www.theregister.co.uk/2019/08/20/ruby_gem_hacked/ when dealing with upgrading open source dependencies. These threat models are people actively trying to hijack your CPU for their own purposes not a general, hypothetical rm -rf / vandalism.

getJSON already lets you reach out to the world for practical purposes and that makes sense to me, but that’s a built in that you can’t meaningfully extend. A more flexible solution to me would be to have an opt-in only flag in config.toml that let you enabled an exec command, or limit it only to scripts that are in a whitelisted directory that you’d inspect yourself. At some point I could/have fat finger sudo rm on my own and I’d still be in the same place, where running in a container would have saved me and also the importance of backups, etc. etc.

I haven’t read this entire thread, but I can say that one part I like about Hugo and will try to keep as long as I’m running the show is this:

  • I can “git clone” and run “hugo” on a random repo (some person have posted a link to a repo reproducing a bug etc.) – and get it up an running not just quickly but also without worrying too much about it being some kind of Trojan deleting files etc. on my PC.
  • I would not in a million years do the same with any SmartsyPantsJS site generator.

The above obviously puts some restriction on (or slow down development of) what you can do. We will certainly pretty soon improve the remote (and also creating pages from remote) without having to sacrifice on the “binary security”.

2 Likes

I think you are optimizing for a very different use case than the one I’m trying to optimize for, which makes total sense. I’ve documented what I do in the situations for playing around with random internet code here – which is absolutely not just running git clone and running whatever you find in there.

You probably need to pull down example repos to help diagnose issues so I get it. That’s just something I personally would rarely ever encounter.

One thing you could consider is what deno does, which is “Unless specifically allowed, scripts can’t access files, the environment, or the network.” You need to pass flags in on the command line to enable features which otherwise are off by default. This was something that was built in from the beginning and may be impossible to retrofit, but it seems like a good solution to limiting the damage something can do unless you explicitly permit it. hugo --allowExec or something like that.

If that’s something you’d be willing to merge I could write a PR for it.

As I suggested earlier, you can read a discussion in the GitHub issues for hugo where this (and several other) solutions were discussed if you want to see a history of the decisions made.

Note: I’m not a dev and was not involved, but read through them after the fact.