Is it good idea to use "tw" instead of "zh-tw" in i18n?

I can use i18n now and I have 2 languages, “en” and “zh-tw”.
I try to change “zh-tw” to “tw” and it’s still ok.
I’m curious, is it a good idea?
The language code is just a string in Hugo or it has specific meaning?

ps: The reason I want to use tw is it’s shorter and I check Apple.com, it uses https://apple.com/tw/

Is it a good idea?

Probably not, but as always: it depends.

Your theme may be using this setting in the html element to flag the used language of your content to the browser. lang - HTML: HyperText Markup Language | MDN This may break if you try non standard stuff here.

Although Apple is a big company, those companies may not be right with their choices 100% of the times.

I would stay with the standard way!

I think you are right, it’s not a good idea. I checked Apple’s HTML code, it use zh-tw as the language code.
But if I keep to use “zh-tw”, is it possible to change URL to “tw” only?
I just want to shorten the URL.

As per IANA (standards body), no.

See:
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

The TW subtag has a type of region
Type: region
Subtag: TW
Description: Taiwan, Province of China
Added: 2005-10-16

This means that it can only be used this way: language_code+REGION so for Taiwan Chinese, the correct language tagging is zh-TW or zh-tw (as per W3C, it is case-insensitive; see next link).

See also:

The language subtag syntax is: language-extlang-script-region-variant-extension-privateuse.

Examples:

  • zh-Hans-TW
  • zh-Hant-TW

Breakdown:

  • zh is the language
  • Hans or Hant is the script (simplified and traditional respectively)
  • TW is the region.

General rule, if it can be shorter, the better. So instead of zh-hant-tw just write zh-tw but not tw only as Apple is using. Some language features of Hugo (and GoLang itself) also relies on correct tag usage (like the datetime localisation).

The reason tw works fine is because major browsers chose to interpret it as zh-tw. However, you can not guarantee that tw will work in other rendering engines and/or browsers, for example, browsers used for special needs like Braille and voice/speech browsing. (IIRC, there was one time Amazon’s gadgets doesn’t interpret incorrect tagging either but later added support because webmasters/web developers of major sites, like Apple in your example, refuses to fix their bug.)

If you want to learn more, I have a very old blog post about it here:

It’s geared towards Philippine languages (we can use up to the variant tag) but it’s practically the same.

2 Likes

Add this in your [languages] section in every language available (just change accordingly per language):

baseURL = "https://example.com/tw/"

or

baseURL = "https://tw.example.com/"

See:

It will work fine even with hugo server, the language selector will adjust accordingly.

However, if your content is already out there and indexed by search engines, you’ll have to either add aliases to your content .md files or write a redirect on the DNS or server level.

For URLs you’re free to choose what you want, as it infers no meaning, besides maybe SEO.

But in the HTML lang attribute you’d definitely want to stick to the standards code, because it’s meant for machine interpretation. For instance the automatic translation of websites by the browser. Or it may aid assistive technologies like screen readers.

1 Like

As other have already mentioned, the subdirectory can be anything. The languageCode should on the other hand be set to standard values.

I have set up languages like this for one of my sites:

languages:
  sv:
    weight: 1
    languageName: "Svenska"
    languageCode: "sv-SE"
  en:
    weight: 2
    languageName: "English"
    languageCode: "en-GB"

I have also set defaultContentLanguage: "sv".

This way the root of the site is Swedish and ‘/en/’ subdirectory is English.

The html lang attribute is however set from the languageCode variable so becomes “sv-SE” and “en-GB” respectively.

The language key (e.g., the en in [languages.en]) currently has four primary roles:

  1. It is prepended to the path portion of the published site (e.g., https://example.org/en/articles/foo)
  2. It links the site with a translation file in the i18n directory. To properly pluralize a translation, the language key must match one of the languages in:
    https://github.com/nicksnyder/go-i18n/blob/main/v2/internal/plural/codegen/plurals.xml
  3. It is used to localize dates, numbers, percentages, and currencies. To properly localize these values, the language key must match one of the locales in:
    https://github.com/gohugoio/locales
  4. It is used to determine collation when sorting.

It’s a bit messy. I tried to provide a complete explanation here:
https://github.com/gohugoio/hugo/issues/8296#issuecomment-1113863196

In the OP’s example, setting the language key to zh-hant-zw will work with items 1 through 3 above. I’m not sure if it will trigger the proper collation sequence (I am not yet familiar with this code).

1 Like

I appreciate all of your suggestions, obvious, it’s a bad idea to use “tw” instead of “zh-tw”.
After 2 days try and error, it works (It should be simple, but I make a stupid mistake…).
But I still 2 questions need help.

  1. I put images in /static/images, I need to use ![ ](/en/images/xxx.jpg) and ![ ](/tw/images/xxx.jpg) to access images now, is it correct?
    (I follow this doc: Static Files | Hugo)

  2. In http://localhost:1314/en/, it still try to connect http://localhost:1313/tw/css/style/***. It’s not a problem in production mode because we use the same port, but it will break local’s environment.