Software Internationalization Guide
Software Characteristics That Vary by Locale
Software Internationalization Guide—526225-002
2-4
Character Sets
Japanese
Japanese ideographs are known today as Kanji. Although similar concepts can be
represented in Chinese and Japanese, the two languages are linguistically different.
In addition to Kanji, the Japanese language has two phonetic systems, Katakana and
Hiragana, each consisting of about 50 characters. The purpose of a phonetic character
is to serve as a modifier to existing ideographs or to create meanings for new words
that do not have an ideographic equivalent. Together, the two phonetic systems are
called Kana. Katakana is usually used for words of a foreign origin, and Hiragana is
used for words with a native origin.
Korean
Some Korean words can be written with the Hanja ideographic system, but all Korean
text can be written with the phonetic writing system called Hangul. Hangul has 24
characters, each representing a specific sound. Syllables are created by combining
variations between two and seven characters.
Most text processing in Korean is based on syllables. For example, an operation to
make a single character bold usually makes the entire syllable bold.
Character Sets
A character set is a group of characters that is used to build the elementary units of a
language. Characters in a set include letters, numbers, symbols, and others such as
control characters. The English, French, Spanish, Arabic, Russian, and Danish
alphabets and the Japanese Kanji are examples of character sets. Internationalized
software must be character-set independent and must be able to support existing and
future character sets.
Table 2-1 shows some characters in the US English, German, and Spanish character
sets.
The characters in a character set are not directly associated with the numeric encoding
that represents the character in a format that can be read by a computer. A group of
encoded characters is called a code set.
Table 2-1. Subsets of Character Sets
Character Sets Characters
U.S. English a, b, c
German
ä, ë, ü
Spanish a, á, b, ñ