Generating all languages in sitemap

Hi,

I’m looking to generate a multilingual sitemap in my Hugo site, but without creating multilingual pages.

I’ll be using a translation proxy to translate the website itself, but I want to pre-generate the sitemap so that I don’t need to worry about keeping it up-to-date and valid with the alternate languages. Hugo can do that for me - so far so good.
However, in order for any pages to be included into the multilingual sitemap, I need to create two of each content file (or more if I’m adding more languages), which sounds wasteful to me, since I only need the sitemap entries.
Right now, things look like this:

/content/
  features/
    centers/
      index.en.md
      index.ja.md

Which, along with defaultContentLanguageInSubdir = true, generates more or less what I want:

/public/
  assets/
    [assets]
  en/
    features/
      centers
        index.html
  ja/
    features/
      centers/
        index.html

It does include the Japanese pages, but I can just remove/ignore/not deploy those to hosting. More importantly, this makes sure that all entries in the sitemap XML will have both the Japanese and the English URL defined. If I remove the language-specific MD file, only the English one is added.

I know that a page without an implicit language (either from directory structure or from file name) will be assigned to the default language. I think what I’m looking to achieve is to have a page without a specific language be included in all languages, at least as far as the sitemap goes (generating the pages themselves, while not necessary, is not that much of an issue, since I can always exclude them from the deployment, and it’s good to simulate the folder structure).

Is there any way to achieve this “include in all languages implicitly” behavior?

Thanks in advance,
Zalan

I fail to understand what you want, because having a sitemap with Japanese items but not having these pages created will result in errors (or penalties?) in most search engines.

Having said that, I recently learned that Hugo automatically creates sitemap indexes for translated websites. One sitemap file per language and one sitemap index file.

Maybe create a sitemapindex layout that ranges through your structures however you want them to be organized?

Also have a look at @ju52’s repo in this topic.

That is true, but it won’t be the case. The purpose of a translation proxy is to create those pages virtually, in real-time for each request, so I don’t need the Japanese pages for deployment, only the English ones, which will get translated separately. Doing the translation process inside Hugo is counter-productive for several reasons, including difficulty in deployment and in proofreading the translations.

That said, I might give your idea a try. Right now, I have this:

{{ printf "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\" ?>" | safeHTML }}
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
{{- range .Data.Pages }}
{{- if not .Params.excludeFromSitemap }}
	<url>
		<loc>{{ .Permalink }}</loc>{{ if not .Lastmod.IsZero }}
		<lastmod>{{ safeHTML ( .Lastmod.Format "2006-01-02T15:04:05-07:00" ) }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
		<changefreq>{{ . }}</changefreq>{{ end }}{{ if ge .Sitemap.Priority 0.0 }}
		<priority>{{ .Sitemap.Priority }}</priority>{{ end }}{{ if .IsTranslated }}{{ range .Translations }}
		<xhtml:link
				rel="alternate"
				hreflang="{{ .Lang }}"
				href="{{ .Permalink }}"
		/>{{ end }}
		<xhtml:link
				rel="alternate"
				hreflang="{{ .Lang }}"
				href="{{ .Permalink }}"
		/>{{ end }}
	</url>
{{- end -}}
{{ end }}
</urlset>

But because the path structure is easily predictable even after proxying (the proxy swaps /en/ for /ja/), I might be able to just repeat for each language.

I don’t know how you sort your languages, but you can range (line 3) through anything you have defined by a selection like taxonomies or slugs.

For the translations, I’m going the way the I18N docs on the site recommended, having a block like this in my config.toml:

DefaultContentLanguage = "en"
defaultContentLanguageInSubdir = true

[languages]
  [languages.en]
    weight = 1
  [languages.ja]
    weight = 2

Although having them defined like this creates both languages in content too. And having only one drops the language prefix (fairly obviously, I guess).