Using the multidimensional content model to fill-in missing translations

For a multi-lingual project with English as default, I’ve tried to use the new sites complements without much success. The use case is: “fill in” missing content in non-default languages from the default English one, so users can benefit from a “complete” site even though some bits are in English.

For now, in hugo.yaml we have:

cascade:
  sites:
    matrix:
      languages:
      - "**"
  target:
    lang: "{en}"

which works, but feels a bit of a hack (e.g. more difficult to target roles and versions). How would this work with sites complements and module mounts?

How do you organize translations? By directory (e.g., content/en/ , content/de/) or by file name (e.g, content/example.en.md, content/example.de.md)?

We use page bundles with translation by file name. Here is an example tree:

content/
├── docs
│   ├── api
│   │   ├── index.en.md
│   │   └── index.fr.md
│   └── _index.en.md
├── help
│   ├── bug
│   │   └── index.en.md
│   ├── _index.en.md
│   └── _index.fr.md
├── _index.de.md
├── _index.en.md
└── _index.fr.md

The resulting public build should have 5 pages for each language (i.e. here German only has a “real” homepage and its two Sections and RegularPages are taken from English).
With the quoted cascade config above, I get:

Click to expand
public/
├── de
│   ├── docs
│   │   ├── api
│   │   │   └── index.html
│   │   └── index.html
│   ├── help
│   │   ├── bug
│   │   │   └── index.html
│   │   └── index.html
│   └── index.html
├── docs
│   ├── api
│   │   └── index.html
│   └── index.html
├── en 
│   └── index.html <-- alias
├── fr
│   ├── docs
│   │   ├── api
│   │   │   └── index.html
│   │   └── index.html
│   ├── help
│   │   ├── bug
│   │   │   └── index.html
│   │   └── index.html
│   └── index.html
├── help
│   ├── bug
│   │   └── index.html
│   └── index.html
└── index.html

To give more context: currently we use heaps of hack-ish ways to “fill in” default English content in non or partially translated pages (e.g. custom params, lists, taxonomies, shortcodes, render hooks, etc.). But users inevitably end up on the English site in a way or another (e.g. internal links cannot resolve to non-existing pages, so English ones are used instead).
The multidimensional content model looks just like to right opportunity to get rid of all these scattered merge, default .Sites.Default et al. :wink:

Summary

If you plan to fill in missing translations with content from another language, translating by directory is the preferred approach .

Feature Translating by directory Translating by file name
Primary mechanism Creates logical pages by mirroring files Complements collections with existing pages
Navigation Prevents crossing language boundaries Allows crossing language boundaries
Logic Mimics source directory to target Global equivalent of the lang.Merge function
Configuration Module mounts Cascade sites.complements

Translating by directory

content/
├── de/
│   └── _index.md
├── en/
│   ├── docs/
│   │   ├── api/
│   │   │   └── index.md
│   │   └── _index.md
│   ├── help/
│   │   ├── bug/
│   │   │   └── index.md
│   │   └── _index.md
│   └── _index.md
└── fr/
    ├── docs/
    │   └── api/
    │       └── index.md
    ├── help/
    │   └── _index.md
    └── _index.md

When translating by directory, use content mounts to fill in missing translations. This approach creates logical pages for every language by mirroring files from the source directory to the target directory. For example, the system effectively copies content/en/help/bug/index.md to the de and fr directories.

This ensures that users never inadvertently cross language boundaries when navigating between pages.

project configuration
[[module.mounts]]
  source = 'content/de'
  target = 'content'
  [module.mounts.sites.matrix]
    languages = ['de']

[[module.mounts]]
  source = 'content/fr'
  target = 'content'
  [module.mounts.sites.matrix]
    languages = ['fr']

[[module.mounts]]
  source = 'content/en'
  target = 'content'
  [module.mounts.sites.matrix]
    languages = ['*']

Translating by file name

content/
├── docs/
│   ├── api/
│   │   ├── index.en.md
│   │   └── index.fr.md
│   └── _index.en.md
├── help/
│   ├── bug/
│   │   └── index.en.md
│   ├── _index.en.md
│   └── _index.fr.md
├── _index.de.md
├── _index.en.md
└── _index.fr.md

When translating by file name, cascade a sites.complements object to complement existing page collections with pages from other languages. This approach does not create logical pages; instead, it allows a single page collection to consist of pages from multiple languages to ensure content availability when a specific translation is missing. This produces the same results as Hugo’s lang.Merge function, but allows you to manage the behavior globally in the project configuration rather than handling it ad hoc within your templates.

Review the following points when using this approach.

  • Users may cross a language boundary when navigating between pages. That means they may begin their journey on the French site, click a link, and arrive at the English site.
  • Selectors will only display languages for which a specific logical page exists.
Project configuration
[[cascade]]
  [cascade.sites.complements]
    languages = ['de','fr']
  [cascade.target.sites.matrix]
    languages = ['en']

Note that the cascade configuration above, specifically the complements behavior, will not work as expected until the next release; I suspect that v0.157.0 will be released in the next week or two.

1 Like

Hi @jmooring
Thank you very much for the thorough investigation!

Once v0.157 (or v0.156.1) is out, I’ll have some tests to assess whether it is worth changing the translation approach. We want the behavior provided by the directory approach (never inadvertently cross language barriers), but the repo organization by filename is more practical for other reasons (workflow, CMS, custom scripts, etc). One has to choose… :wink:
I’ll report back as it may be useful to others.

2 Likes

The change from Translating by filename to Translating by directory was pretty quick actually.
Here is a quick and rough test using the Translating by directory approach with:

module:
  mounts:
    - source: content/<locale>
      target: content
      sites:
        matrix:
          languages:
            - <locale>
    - source: content/en
      target: content
      sites:
        matrix:
          languages:
            - "*"

instead of heaps of template hacks to sort-of mimic a complete site (which is not fully possible currently with the Translating by filename approach).

The project currently has 12 locales and English as default:

Translation by filename Translation by directories
Hugo version 0.149 0.157
Built pages 500 en, 50 for each other locale 500 each locale (mount)
Non-page files 350 350
Processed images 600 600
Build time ~3s ~15s
Items ~1000 dirs, ~2500 files ~7500 dirs, ~7500 files
Total size ~120 Mo ~350 Mo

So all in all, really decent results. The big bonus are never crossing language boundaries and hundreds of template LoC removed! Thanks again

4 Likes

Thanks for closing the loop on this. It’s one thing to create test cases and projects, but seeing further evidence of it being used in the wild is great.

cc: @bep

This is very interesting. I have one question (apologies if I missed this in the docs): let’s say I have it setup as above, translation by directory and missing pages in other locales “complemented” from the English version. Would the .Page.Language method on an English page in, for example, the Dutch site return English or Dutch? My hope would be English, otherwise you’d run into trouble with translation software, text to speech etc. because of the mixed language content.
It’s fine if you can’t test it now, I can also just try it in a few days, I’m just curious about how it works.

When you use Hugo’s mounts configuration to fill-in missing translations, you are effectively instructing Hugo to treat files from the English directory as a native part of the Dutch site’s file system. Consequently, Hugo processes these “borrowed” pages as members of the Dutch site object, meaning the .Page.Language method will return the language of the current site context (Dutch) rather than the original source language of the file.

1 Like

Okay, thank you, that’s good to know. As I’ve said, that may lead to accessibility issues. But I guess you could work around it with a custom frontmatter Param, cascaded from the index file. I’ll experiment with this feature over the weekend!

Sorry for being late to the game and maybe not getting the full context here, but I just wanted to chime in that the sites config on file mounts and front matter have 2 concept:

  • matrix: Duplicates content for e.g. each langage defined, each page gets is own .Page.Language and file on disk.
  • complements: Effectively says that this content can complement/fill in gaps in other sites. This will not create new objects/files on disk, so when e.g. doing site.RegularPages you will get a mix of page languages. When I implemented this feature, I had the roles dimension in mind, e.g. for promotion of membership articles to the guest site.
1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.