Software Internationalization Guide

Software Characteristics That Vary by Locale

Software Internationalization Guide—526225-002

2-4

Character Sets

Japanese

Japanese ideographs are known today as Kanji. Although similar concepts can be

represented in Chinese and Japanese, the two languages are linguistically different.

In addition to Kanji, the Japanese language has two phonetic systems, Katakana and

Hiragana, each consisting of about 50 characters. The purpose of a phonetic character

is to serve as a modifier to existing ideographs or to create meanings for new words

that do not have an ideographic equivalent. Together, the two phonetic systems are

called Kana. Katakana is usually used for words of a foreign origin, and Hiragana is

used for words with a native origin.

Korean

Some Korean words can be written with the Hanja ideographic system, but all Korean

text can be written with the phonetic writing system called Hangul. Hangul has 24

characters, each representing a specific sound. Syllables are created by combining

variations between two and seven characters.

Most text processing in Korean is based on syllables. For example, an operation to

make a single character bold usually makes the entire syllable bold.

Character Sets

A character set is a group of characters that is used to build the elementary units of a

language. Characters in a set include letters, numbers, symbols, and others such as

control characters. The English, French, Spanish, Arabic, Russian, and Danish

alphabets and the Japanese Kanji are examples of character sets. Internationalized

software must be character-set independent and must be able to support existing and

future character sets.

Table 2-1 shows some characters in the US English, German, and Spanish character

sets.

The characters in a character set are not directly associated with the numeric encoding

that represents the character in a format that can be read by a computer. A group of

encoded characters is called a code set.

Table 2-1. Subsets of Character Sets

Character Sets Characters

U.S. English a, b, c

German

ä, ë, ü

Spanish a, á, b, ñ