Multilingual web site: Why, What, Where, How

Making a web site multilingual is a requirement than comes up more and more given the drive for inclusivity and the surge in ethnic identification.

Whether we are a commercial, governmental or non-governmental operator, they are either legal or economical incentive that make this ability a requirement to some degrees, that takes care of the “Why”.
The degrees represent the effort to make which lead to the corresponding degreee of ability (the “What“):

1. Making the operator’s identity available in multiple languages
2. Making the operator’s identity available in multiple languages as well as navigational elements/aids (text/audios)
3. Making the identity, navigational elements as well as the business content or business processes available or usable in multiple languages on the web site
4. Making the identity, navigational elements, business content/processes, transactional communication available in multiple languages
5. Making the identity, navigational elements, business content/processes, transactional communication, promotional or community engagement available in multiple languages.
6. Any combination of the above

We could go even further and mention: ancillary documentation and guidelines, licencing terms, offline promotions, …

An other perspective is to look at the boundaries of the web site. Modern applications can leverage web technologies allowing the online “property” to span multiple channels and platforms (The “Where“):

1. The web site (e.g: www.example.com)
2. The Email communications (moc.elmaxenull@ofni)
3. The Real time communications (Web Chat, Twitter, Wechat, Line, …)
4. The social media networks (Facebook, Pinterest, LinkedIn, …)
5. the Mobile apps (e.g: iOS and Android smartphones and tablets, as well as more and more embedded devices – TVs, in-car entertainment/navigation, …)

The other relevant perspective to consider is how deep to go with aligning with a user’s culture (the “How“):

1. Translation of text and/or audio in any of the dimension mentioned above (internationalisation)
2. Translation and adaptation to locally established approaches for accessing, representing, using your content, product or service (localisation)
3. internationalisation, localisation and adaptation of your product, service or content to the culture of the target users
4. Any combination of the above

Taking a project through that matrix can result in multilingual requirements that start from easy to very complex, from costing reasonably to very expensive, and from quick to implement to long time effort.

Most of the time however, the requirement will be dictated by established practices within your market, constituency or community, or in your domain of activity or by the goal of the web site.
I have noticed two sets of patterns in the operations of multilingual web sites, and your requirements may fall in or close to a combination of these two sets:

How many?

1. One language only, e.g: established in mono-cultured region or move from one region of different language to another region of another language
2. Two-four languages, e.g: globally multi-cultural regions or markets where culture and language are tightly coupled: English/Spanish in the US, English/Chinese in Hong Kong, English/French in Canada or French/German/Italian/Roman in Switzerland
3. Five or more languages, e.g: international organisations that need to maintain local presence and/or activity in many territories not bound by one language

What effort?

1. The content or the process has a clone in every supported language
2. Within a given piece of content or process, the representation in all supported language are present at the same time
3. A given piece of content or process is not necessarily available in all supported language

What to think about

Design decision

When implementing support for multiple language on a a web site, these are the design decision to make

1. Will the URL (web address) be different for each supported language?

It is most relevant for use case where web resources (page or form) are in one language only and the web site support multiple languages.
The language can be represented in several ways, each having its set of pros and cons as shown below with French language:

1. fr.example.com/resource
2. example.com/fr/resource
3. example.com/resource?lang=fr
4. example.com/resource.fr
5. example.com/resource_fr

And that’s just for an abstract example of web endpoints. On social media and realtime services, the last two can be used for a page whereas if using a Content management system the subset may be different (for example, WPML the most popular multi-lingual enabling tool for the WordPress CMS, allow the first three options only).

2. How will our web site’s support for multiple language be considered and represented by search engines and other web infrastructure?

Nowadays, search engine are not as restrictive as they used to be and they will find and index resources whichever of these five options is chosen.

However the  major search engines implements semantic data parsing that show the result in a way most useful for the visitor to interact with the resources.

Using a url like the second or the fourth one will potentially improve the usability of the resources from the result of a search as well as the ranking.

The third options doesn’t work well with caching systems and the first option may become costly if you support many languages and have to pay a fee for either additional subdomains (at the domain registrar or the DNS provider) or the encryption certificate that is attached to them.

3. How do we manage the content with support for multiple language in mind?

Many Content Management system do support multi-language content publishing and the editing tool usually factors one or more of the following:

 * function allowing a user to specify the language of the content
* function allowing a user to publish in a language specific destination
* function allowing a user to mark a resource a translation of another resource

Pitfalls

1. Character encoding, language code and locale

there are 3 sets of mechanisms involved in handling and managing language in software. Most language-related software issues comes from one of these 3 sets not being implemented or configured adequately.

Character encoding: This represents the way the character of a given language is represented in software using an encoding system.
From Wikipedia:

“In computing, a character encoding is used to represent a repertoire of characters by some kind of an encoding system.[1] Depending on the abstraction level and context, corresponding code points and the resulting code space may be regarded as bit patterns, octets, natural numbers, electrical pulses, etc. A character encoding is used in computation, data storage, and transmission of textual data. Terms such as character set, character map, codeset or code page are sometimes used as near synonyms; however, these terms have related but distinct meanings.”

Unicode is the international standard for character encoding that supports most of the character set used in the world, while ISO-8859 represents the character-set of European languages. Subsets of Unicode are for example: UTF-8, UTF-16.

Country code: This represent the way a country is labelled in software
There are different standard depending on whether we are referring to an official language or to a written language or to the country itself
in the context of support for multiple language we often see when configuring software components, combinations like:


zh-TW

The symbol above is an IETF language tag and it means: Chinese language (zh) as used in Taiwan (TW)


nan-Hant-TW

The symbol above is an IETF language tag and it means: Min Nan Chinese language (nan) using traditional Han character (Hant) as used in Taiwan (TW)


pt-BR

The symbol above is an IETF language tag and it means: Portuguese language (pt) used in Brazil (BR)

locale: This represent how language and region sensitive UI elements, constants, operations and calculations are performed in software
It is represented as [language_territory].codeset and defined by ISO 15897.
It is especially important for date and time calculations and reprsentations


de_DE.iso88591

The symbol above represents German (de) in Germany (DE) with iso88591 character set.


zh_CN.UTF-8

The symbol above represents Chinese (zh) in People’s Republic of China (CN) with UTF-8 character set.

2. Directions

Most languages read from left to right, but some read from right to left. Do you require support for such language ?

If yes, the technology implementation needs to take that into account.

3. Behaviour when no translation available

This is especially a problem with web site where all resources contain one language. The navigation mechanism must allow a per language access.

And it must know what to do if navigating away from a resources that do not (yet) have a version in one of the supported language.

4. Idiosyncrasies of the chosen web technologies

When deploying a web site for businesses, they are a lot of moving parts.

Case study

One of my client, HKIMS, a Buddhist organisation of meditators in Hong Kong required a web site for publishing Buddhist meditation material and for managing the registrations to meditation retreats as well as a newsletters for the member base.
The web site needed to support the two official languages in Hong Kong, English and Chinese.
In Hong Kong, the written Chinese use traditional characters. Chinese is to be the default language.
The url we use for the web pages ended being  Option 2 for the English version: hkims.org/en/, and the Chinese being the default, we decide not to add a language directory. So the home page for Chinese language will be hkims.org.
This choice reflects that the majority of the web site visitors are primarily Chinese Speaking.
The locale for the Chinese pages were set to zh_TW.UTF8. Why? The language components on our CMS translated our choice of traditional Chinese to the zh_TW language. Taiwan also uses traditional Chinese characters like in Hong Kong, and their systems only have one entry for Traditional Chinese.

For all intent and purpose, it doesn’t make any difference.
The locale for the English pages were set to en_US.UTF8. Here en_GB would be more appropriate but the language component default the choice of English to en_US.
In both cases, we have the possibility to override, but the only external impact is how search engine detect the language of our web site. And they just need to know what is in English and what is Traditional Chinese. So no need for the extra effort.
The main operator of the web site and the web master are fluent English speaker and don’t speak Chinese respectively so the admin section is set to default to English.
From a usability perspective is was important that the user could switch language from any page as well from the top of the page or the bottom of the page.

This is because they could enter the web site from many entry points and do so from a desktop computer, a laptop computer and very often their smartphone.

When accessing the web site from the latter, the language switcher at the bottom of the page was the most usable.
If a given resource is not available in one language the language switcher won’t appear on the page and the page won’t be linkable in the navigational structure for the missing language.
The choice of having all resources available in both languages allow us to fairly server both communities and to have a fair representation in both language on search engines. It requires us to have the staff to publish in both languages.

The challlenges of building multilingual web sites

Leave a Reply

Your email address will not be published. Required fields are marked *