Load data from YAML/XML/JSON files?

Hi all,

I would like to know if we can use an external files as a source, because I think we can’t use a database for the generation of some pages.

Thank you

Currently you can add data to the sitewide config file. I’m exploring extending to any data file in /data/ . I believe there is already a ticket about this.
It shouldn’t be very hard to do if someone wants to pick it up and contribute this… All the logic is already there. It just needs to be extended to read from more than one file.

could you explain, I don’t know the structure and the code of hugo, but why not. and how do you see the implementation?

Sure… Here is the Ticket https://github.com/spf13/hugo/issues/476 .

The implementation is straightforward. Any file can be placed into /data/ … say /data/authors.yaml

This is then accessible under Site.Data.authors

Data files can also be placed in sub-folders of the data folder. Each folder level will be added to a variable’s namespace.

I think Jekylls implementation is very correct here. They document it at http://jekyllrb.com/docs/datafiles/ .

I would make two changes. 1. Only support the same formats as we already do (JSON, YAML, TOML). 2. Call the folder data/ … Also add that as a flag --datadir=/a/different/path.

Flags are added here

The Params are extracted from the config file here.

Ohh… One thing.

In Hugo we already call these Params. Jekyll calls them data.

I’m not sure if we should just inject this into Params, or Create a new accessor called .Data … I’m leaning towards the .Data approach, but not sure if that makes sense.

What do others think?

great, an other point. how would you generate a page/archetype for each item of a collection? and how to use it?

you have ./data/authors.yml we need to add ./layout/_default/authors.html ?

There currently isn’t a way to add a html page without a content file (or metadata in a content file) dictating the creation.

I’ve been thinking about how best to do this.

My current thinking is that you would actually add a go.html content file where you want the file to be created…

For example if I wanted to create http://domain.com/authors I would create

/content/authors.go.html 

This would be a node and would have all of the site data available to it, but no page specific data.

Of course today you can do this, but it requires a bit more… Since all pages require a content page you would need to create a content file and a template file.

One way to do this would be to create a markdown file wherever you want it. Set the url and layout in the metadata. Then make sure you create the layout file you specified. It will use this to render a file in the url location you specify.

Another way that’s a bit more in flowing with Hugo’s normal way of operating would be to create /content/authors/index.md (really any name works if it’s in that directory). Then I would create a section template for authors that read from the data source and listed that.

The only disadvantage of this approach is the content file will appear in the RSS feeds and Pages listings.

Another idea I had which would work well is to add a hidden flag to content. Hidden would not include the content in any listings, but would render it.

I think I may like this approach the best. It gives a pretty straightforward way of adding literally any page anywhere without much work.

+1 on calling this stuff data, separate from params. It’s nice to have different names for things that live in different places, so you can differentiate. Note that there’s already some variables called “Data” in Hugo, so we should be careful about confusion (maybe rename the old variables).

+1 for loading data from yaml/xml/json

See also https://github.com/spf13/hugo/pull/748: Feature: GetJson and GetJson in short codes or other layout files by @SchumacherFM

And

I have nearly finished a solution which transparently integrates into the current process of reading files from the HDD.

My solution downloads a JSON (or whatever format) stream like that one http://cyrillschumacher.com/sourceStream.json and adds those “virtual files” to the files slice. Even the watcher is fully supported … but only when a local real file changes then the build process will redownload the JSON.

The JSON stream looks like:

{"Path":"path/to/content1.md","Content":"the usual hugo content goes here"}
{"Path":"path/to/content2.md","Content":"the usual hugo content goes here"}
{"Path":"path/to/content3.md","Content":"the usual hugo content goes here"}

A JSON stream is separated by line breaks so in the Content there must be no line breaks.

Why a stream of JSON?

Because my main ambition is to create ~100k pages which are generated from a Magento store. This Magento store creates a JSON stream of product and category pages. Stream parsing is much more efficient that parsing a whole JSON blob at once.

This feature comes in addition to PR https://github.com/spf13/hugo/pull/748 that means in your JSON stream content you can also use the feature of PR 748 to download other JSON.

I will send a PR when PR 748 is merged.

3 Likes

https://github.com/spf13/hugo/pull/748 is now merged (even though Github marks it as closed).

Any updates to this thread? This sounds like a really good idea.

It’s closed as in added/implemented.

This and the new Data Files support should be in the doc.