Working with data

Hello,

Problem 1

There seems to be a bug in how TOML works. This is what I see.

If I put a toml file in the data folder, its shortcode in the layouts/shortcodes folder, and the call in the markdown file, then change someting in the toml file, the page is updated on the fly. Seems ideal, as the code is short and the update is fast.

If I add the same using csv, hugo returns an error as soon as I add the call in the markdown file. What happens is that the toml shortcode wants to read all files in the data folder, and triggers an error for anything other than toml and yml.

This is the shortcode for toml:

{{ $data := .Site.Data.filename }}
<ul>
{{ range $data.filename }}
<li>
<strong>Name:</strong> {{ .name }}<br>
<strong>Email:</strong> {{ .email }}
</li>
{{ end }}
</ul>

The error code is:

unexpected data type [][]string in file filename.csv logged 1 error(s)

This makes it impossible to work with mixed file formats other than toml and yml.

Problem 2

If I remove the toml call from the markdown file, the csv chain works as expected. However, unlike with toml, if I change something in the csv file, the page is not updated on the fly. To update the page, I have to restart hugo.

Problem 3

Suppose you have 100 markdown files. Each file is like a headed letter: the name and address of the company is printed center front. You do not want to write name and address 100 times. In Jekyll (competitor), you write the following inline code, directly in the markdown file, and it reads data/filename.csv automatically:

<center>
{{ site.data.filename[0].name }}<br>
{{ site.data.filename[0].address }}
</center>

In Hugo, you have to write a shortcode, then call the shortcode from the markdown file.

The shortest shortcode (pun intended) in go is the above one, which requires toml. Other file formats require more code.

{{ $csvData := readFile "data/filename.csv" }}
{{ $lines := split $csvData "\n" }}

{{ if gt (len $lines) 1 }}  <!-- Check if there is at least one data row -->
{{ $header := split (index $lines 0) "," }}  <!-- Get the header row -->
{{ $firstLine := index $lines 1 }}  <!-- Get the first data row -->
{{ $fields := split $firstLine "," }}

{{ $company := dict }}  <!-- Create a dictionary to hold the company data -->

{{ range $index, $name := $header }}
{{ $key := trim $name "\"" }}  <!-- Remove quotes from the header name -->
{{ $value := trim (index $fields $index) "\"" }}  <!-- Remove quotes from the field value -->
{{ $company = $company | merge (dict $key $value) }}  <!-- Add to the company dictionary -->
{{ end }}

<div class="company">
<p>{{ printf "%s, %s" $company.name $company.email }}</p>  <!-- Output name and email -->
</div>
{{ end }}

And it still fails, because it returns a piece of the address instead of the email.

Problem 4

Assume you solved the above problems.

Since you have 100 pages, will Hugo read data/filename.csv 100 times?

Problem 5

Suppose you have a markdown page with sprinkled data from the csv file. With Jekyll you focus on the markdown file and write the occasional {{ site.data.filename[0].wantedfield }} where it belongs.

With Hugo, you have to write it all in the shortcode.

I am learning Hugo, because Jekyll is a nightmare to install, and is slow to update. So this is a one way journey for me. Can you point me to the right direction please, as the above problems are a slowing me down.

Thank you.

Are you placing the CSV file in the data directory?
Why do you need to use CSV instead of something else?

There’s nothing wrong with using CSV, but it would be helpful to know why.

Yes, I added them in the same data folder.

I tried to isolate them, in a subfolder, but toml reads also the subfolders.

I am using CSV because I have a pool of such files from the Jekyll projects. Some are generated on the fly by other tools. I cannot rewrite the whole system into TOML or YML.

Do have one CSV file per page, or does one CSV file hold data for several pages?

In the headed letter case, it is one csv file holding company data for all 100 pages.

In other cases, it is one csv file for the page that requires it.

Can we place the page-specific CSV files adjacent to the content file, or do they need to be centrally located. And if they’re centrally located, how are they logically related to the page?

I am using the structure dictated by Hugo, so data files go into the data folder. What is the alternative? Consider that certain pages use more than one data file.

We have several alternatives. Before answering your questions, I want to know if we can place the page-specific CSV files adjacent to the content file, or if there is some limitation in your setup that would prevent that.

I will look at this again in a few hours.

I would rather keep all data files in the data folder, as putting them in the content folder will cause confusion. I can have a filename.md and filename.csv in the content folder, assuming filename.md uses data only from that file, but some pages read data from more than just one csv file. This is also a problem with shortcodes, as I would have to open multiple files and work out the page in the shortcode file. Jekyll’s approach is faster and leaner in this case.

Perhaps the solution to problem 1 is to correct what seems to be an error. When the shortcode calls {{ $data := .Site.Data.filename }}, the machinery must look for filename.toml only, instead of reading and parsing everything under data/. This will also speed up things.

Expectations

To set your expectations, you will not be using the data directory, nor will you use any of the os functions to read the files.

https://gohugo.io/content-management/data-sources/#data-directory

CSV files contain tabular data, not object data.

Example

Here’s a working example that loads CSV data from either global or page resources, and wrangles it into object format. I have no idea how you want to use the data, or how it’s structured, so for now the “get-csv-data” shortcode simply displays the data structure.

git clone --single-branch -b hugo-forum-topic-53817 https://github.com/jmooring/hugo-testing hugo-forum-topic-53817
cd hugo-forum-topic-53817
hugo server

The purpose of this example is to demonstrate how to access and work with CSV files, regardless of whether they are centrally located or located adjacent to content.

Data sources

Thank you for the reference.

Problem 1

Do not place CSV files in the data directory. Access CSV files as page, global, or remote resources.

No, I cannot put csv files in the content folder.
No, I cannot use remote resources.

A global resource is file within the assets directory, or within any directory mounted to the assets directory.

OK it works.

Problem 2

To update the page, I still have to restart hugo. I can live with it.

Problems 3 and 4

Hugo reads the combined data structure into memory and keeps it there for the entire build. For data that is infrequently accessed, use global or page resources instead.

I can work with TOML on this.

Problem 5

Is there an alternative to writing the markdown page inside the shortcode?

I don’t know what that means.

I have markdown pages from Jekyll with calls to several csv files, for data to appear in the text flow. This means, file1,csv file2.csv … fileN.csv open, with calls like this, to fill in forms:

Text {{ site.data.file1[0].field }} text {{ site.data.file2[3].field4 }}.

When porting such form pages to Hugo, I would have to define a shortcode for each call. This means writing hundreds of shortcodes. To add insult to injury, many shortcodes are 99% identical, as their only difference is the field to print.

The alternative is to port the markdown page into a single shortcode.html, with lots of code mixed with actual text.

This is a nightmare. Is there an alternative? Because no alternative means I am going back to Jekyll.

Why can’t you parameterize the shortcode to accept additional arguments?

From

{{ site.data.filename[0].name }}

to

{{< read filename.csv row column >}}

?

Yeah, something like that.

Using TOML, is it possible to call a value directly in markdown, to avoid writing a shortcode?

No, it is not.

Let work out a complete example.

Jekyll

This is Jekyll in a markdown file:

<center>
{{ site.data.company[0].name }}<br>
{{ site.data.company[0].address }}
</center>shortcode

This is the filename.csv file:

"name","address"
"company A","example.org"
"company B","example.com"

Jekyll reads filename.csv and prints company A’s values.

Filling in forms, hundreds of forms, automatically, using markdown and calls like the above one, is a piece of cake.

Hugo

Let see how Hugo does it.

We add the following to hugo.toml

[data]
   name = "company A"
   address = "example.org"

write the following headerletter.html shortcode:

{{ .Site.Params.data.name }}<br>
{{ .Site.Params.data.address }}

and finally write the following to the markdown content:

{{< headerletter >}}

We get company A’s values in the rendered page.

Can it be done without a shortcode or with an inline shortcode? The answer seems negative.

Hugo’s security model is based on the premise that template and configuration authors are trusted, but content authors are not. This model enables generation of HTML output safe against code injection. — [Shortcodes]

The above data must go in its own file, so we move it from hugo.toml to data/company.toml, and we change the headerletter.html shortcode accordingly:

{{ .Site.Data.company.data.name }}<br>
{{ .Site.Data.company.data.address }}

We get company A’s data in the rendered page.

On filling forms, nobody wants to write a shortcode for reading each value in a data file. For lack of a default solution, one must write a read.html shortcode that accepts as input a filename and the name of the data cell, for content-allowed calls like the following:

<center>
{{< read company data name >}}<br>
{{< read company data address >}}
</center>

I wrote the following shortcode that seems to solve the problem.

{{/* 

This is layouts/shortcodes/read.html to read TOML data files. 
usage: {{< read filename tablename fieldname >}}

*/}}
{{ $params := .Params }}
{{ $filename := index $params 0 }}
{{ $table := index $params 1 }}
{{ $field := index $params 2 }}

{{ $data := index .Site.Data $filename }}

{{ if $data }}
   {{ $tableData := index $data $table }}
   {{ if $tableData }}
      {{ $value := index $tableData $field }}
      {{ if $value }}
         {{ $value }}
      {{ end }}
   {{ else }}
       <p>Table "{{ $table }}" not found in file "{{ $filename }}".</p>
   {{ end }}
{{ else }}
   <p>File "{{ $filename }}" not found.</p>
{{ end }}

I am surprised that Hugo does not include this function by default.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.