Software Internationalization Guide

Software Characteristics That Vary by Locale
Software Internationalization Guide526225-002
2-5
Code Sets
Code Sets
A code set assigns a unique numeric value to each character in a character set, with a
designated number of bits representing each character. In the past, hardware
limitations restricted the number of bits used to represent characters, but with fewer
hardware restrictions software can now support code sets of up to hundreds of
thousands of characters. The expansion of hardware capabilities and the expansion of
software technology into worldwide markets have resulted in the development of many
code sets.
Because so many code sets exist, characters in a character set often have different
values in different code sets. For example, in the American Standard Code for
Information Interchange (ASCII), the character A is assigned the decimal numeric
value 65 and B is assigned 66. The Extended Binary-Coded Decimal Interchange
Code (EBCDIC), however, assigns the decimal value of 193 to the character A, and
194 to the character B. Internationalized software must be code-set independent to
operate on different computer systems that use different code sets.
Single-Byte Code Sets
Traditionally, 7 or 8 bits have been used to represent a character and its numeric value.
Figure 2-5 on page 2-5 shows how the character A is represented by 8 bits.
ASCII
The ASCII code set uses only 7 bits to represent the 128 characters in the code set. It
contains only the uppercase and lowercase characters in the US English alphabet, a
few punctuation symbols, the digits 0 to 9, and various symbols and control characters.
ASCII is not appropriate for international use because of its limited number of
characters.
ISO 8859
The International Organization for Standardization (ISO) 8859 series is a group of
standard eight-bit code sets. ASCII is a subset of each of the ISO 8859 code sets.
Eight-bit code sets can support many character sets. ISO 8859-1, the most commonly
used set, supports most Western European languages. Table 2-2
shows each ISO
8859 code set and the languages it supports.
Figure 2-5. Eight-Bit Representation of the Character A
A = 65 =
VST006.vsd
0 1000001