Proposal: implement pre-validation for YAML (and other Front Matter formats)

Hello,

I have use case reported at Cannot use .Site.RegularPages.Related with index called "categories"; all other taxonomies work fine · Issue #10346 · gohugoio/hugo · GitHub which I want to share:

Malformed document with front matter like this

categories:

  - ABC: xyz
  - Hello
  - World

Configuration:

taxonomies:
  category: categories
  series: series
  tag: tags
  term: terms
  authors: authors

related:
  includeNewer: false
  indices:
  - name: categories
    weight: 100
  - name: date
    weight: 10
  threshold: 80
  toLower: false

Error message:

indexing currently not supported for index "categories" and type []interface {}

Don’t ask me how I found this :slight_smile: - not easy, considering I have single malformed document in 10,000.

Error message is strange; instead, Hugo had to fail earlier processing this document.

2 Likes

Hi Fuad and welcome here!

a standalone executable written in Go, called “Yq”, provides Front Matter “linting” with the option --front-matter="process"

The documentation was missing some bits so I wrote a review.

I didn’t test --front-matter="process" in depth but IIRC there was an issue: if a Markdown file content contained something similar to a Front Matter, it was modified too.

For example

---
title: foo
date: 2022-09-02
---
Consequat velit excepteur ad veniam nostrud culpa ea dolor
anim pariatur duis. Et cillum duis nulla non. Magna cupidatat
et excepteur consectetur irure ullamco voluptate irure mollit
ad exercitation proident laboris consectetur.
---
innocent content contained between two separators
---
Consequat velit excepteur ad veniam nostrud culpa ea dolor
anim pariatur duis. Et cillum duis nulla non. Magna cupidatat
et excepteur consectetur irure ullamco voluptate irure mollit
ad exercitation proident laboris consectetur.

innocent content should not be modified, but was, IIRC.

The content between two ATX headings should stay unchanged too (i didn’t test).

Example:

---
title: foo
date: 2022-09-02
---

Title 1
-------
Consequat velit excepteur ad veniam nostrud culpa ea dolor
anim pariatur duis. Et cillum duis nulla non. Magna cupidatat
et excepteur consectetur irure ullamco voluptate irure mollit
ad exercitation proident laboris consectetur.

Title 2
---------
Consequat velit excepteur ad veniam nostrud culpa ea dolor
anim pariatur duis. Et cillum duis nulla non. Magna cupidatat
et excepteur consectetur irure ullamco voluptate irure mollit
ad exercitation proident laboris consectetur.
1 Like

Hi @iaeiou

Thank you for the “Yq” review!

I think the issue here is slightly different, YAML snippet which I provided is syntactically correct, but the result is disastrous:

categories:
  - ABC: xyz

For comparison, this one works as expected:

categories:
  - "ABC: xyz"

So perhaps we need some Hugo-specific syntax analyzer which can catch and report such issues, maybe “Yq”-based; kind of “strong typing” for Hugo

1 Like

For now, I know what to do: always use quotes.
I can use “Find and Replace” in IDE, with RegEx, to catch and fix such issue.

I’m not sure how we would detect that this is wrong:

categories:
  - ABC: xyz
  - Hello
  - World

Because, as a data structure, there’s nothing wrong with it. In JSON the same structure is:

{
  "categories": [
    {
      "ABC": "xyz"
    },
    "Hello",
    "World"
  ]
}

The structure only causes problems when used in a particular context (e.g., taxonomy terms).

Situations like yours are yet another reason I prefer TOML over YAML in front matter. It’s pretty hard to get this wrong:

categories = ['ABC: xyz', 'Hello', 'World']
1 Like