Read media and JSON content outside hugo's work directory

I have the following setup:

  • a repo/directory (website-generator) for the hugo generator containing theme, layouts and all core site building assets.
  • a repo/directory (documentation) for the site content - markdowns, media, git history etc.
  • a repo/directory (website) for the build output see repos website-generator, documentation, website
(because I am a new user and I can post up to 2 links !??)

The site build is triggered by a shell script. It copies the content from documentation into hugo as content folder.

Next, there is a pre-build step in which a nodejs pulls the commits for each markdown and serializes them as json side-by-side with the corresponding markdowns.

The site-building then involves git-related partials that make use of .GetJSON to read the jsons and present git-related information on the relevant pages.

Another use of resources that I have (still not published in the repos above) is .readDir which I use to read a set of images form a provided directory and create carousel out of them dynamically.

All this worked, until I decided to skip the copy content step and instead use the --contentDir flag and point hugo directly to documentation. As I read the documentation and the discussions here, both .GetJSON and .readDir are not intended to be used outside hugo’s work directory:

  • .GetJSON docs: “the source files must reside within Hugo’s working directory”
  • .readDir docs: “Gets a directory listing from a directory relative to the current working directory.”

I can understand if .GetJSON is dedicated to non-content data files that need not be published and therefore not necessarily part of content directory itself. Although, as you see it can have its uses too.
I’m not sure about the rationale behind restricting .readDir though, but I assume it’s the same.

There was a proposal to use

{{ with .Page.Resources.GetMatch $jsonfile }}
{{ .Content | transform.Unmarshal }}
{{ end }}

, which seems to be able to take it over for .GetJSON for non-hugo resident content directory. I want to confirm first with you if that is the case?
But even if it is, I am not sure how to construct the path for $jsonfile here. In my case it’s not going to be starting with content and I don’t want to hardcode paths.

For the use case with loading image files I guess I can get away with the same approach and using .Match with wildcards (**.jpg)?( Provided that there is a way to resolve the path construction issues above.)

If Hugo is in no position to provide functions to read files from content directory regardless of its location, then I suppose I need to create the commits jsons in hugo’s data directory, maintain the same directory structure and map them to their markdown counterparts based on it. Unless there is a better way.
For the carousel use case using .readDir however, I have no good ideas because the images are content and need to stay with the content directory.

I am aware that I can create a symlink from hugo’s work directory to the documentation (content) directory and solve the issue with copying content and preserve current behavior. I am asking for an advise how to approach these problems without creating additional resources to circumvent limitations?

What does npm do for you? I looked at one of the “remote pages” in your docs repo, and couldn’t figure out what is happening. What is an example of the JSON you are reading?

Npm does a couple of things:

  1. fetches commits history. .GetJSON has limited authentication options and doesn’t work for me.
  2. fetches content from remote locations (public/private repos, wikis…). Remote pages are just placeholders with metadata, including the remote content location.
  3. rewrites remote links to changed location

Here’s a sample json for the corresponding markdown.

I think I’m missing something cognitively: you can show the commits in a public repo, but you can not make the commits available over a public API, to be consumed by your site at build time?

If by public repo you refer to the website repo for the hugo build output, yes. Although that is not intended or pursued in any way and in fact good housekeeping should have taken care of it. As for the latter part, once they are fetched by npm, no I can use no API that I am aware of.

A mirror of the repos I showed above works also in a private network, gathering far more internal material, generally protected in ways that require more advanced authentication than .GetJSON supports. That’s also not visible in the npm we discuss on the public repo above because the public site building variant does not need it. Bottom line is, it does not qualify as public API, so I cannot use .GetJSON directly to get the commits from our internal githubs during site building.

But anyway, the key question here is, what can I use to read from file system locations content that is not inside hugo’s directory. Consider also the other use case I outlined, where I need to read a bunch of media files that are part of the content and that won’t help me there, when content does not reside in hugo. Page resources seems a close match to what perhaps could be generally applicable approach for the problem. Provided that there is a solution how to avoid hardcoding the path.

Recapping what worked for me if anyone with similar problem hits that.

In a distributed setup where content is another git repo, distinct from your hugo website builder repo, what worked for me best with .GetJSON and .readDir was to symlink the source repo as content in the hugo repo. The only thing I’ve noticed you don’t get in this way is .GitInfo (but this might be related to other issues as well, I haven’t debugged that particular problem as .GitInfo was too limited for my case anyway). However, if the jsons you read with .GetJSON are useful only for the site building purposes and do not need to be served as content, you are much better off moving them to hugo’s data folder. There you could also employ them in which expressions and read them even without explicit .GetJSON with $.Site.Data.<json-file>.
Having a well-known content folder path solves my issues with .readDir too.

I don’t have an answer how to supply dynamically the contentDir site configuration parameter (or as --contentDir) to form valid path e.g. for .readDir (imagine this setting as an environment variable specific to a particular deployment). It seems that the two just don’t play well together.