Software Internationalization Guide
Software Characteristics That Vary by Locale
Software Internationalization Guide—526225-002
2-8
Unicode
combining character, using the base character a followed by the acute accent. As
Figure 2-7 shows, the resulting code value uses four octets—two for the a (0x00 0x61)
and two for the acute accent (0x03 0x01).
The combining character method of encoding allows any number of combining
characters to follow a base character, thus enabling support of such languages as
Arabic and Thai.
ISO 10646 has three conformance levels that provide flexibility in its implementation:
•
Level 1: Combining characters not allowed.
•
Level 2: Combining characters allowed only for Arabic, Hebrew, Indic, and Thai.
•
Level 3: Combining characters allowed with no restrictions.
Unicode
The Unicode code set currently uses the two lower octets, row and cell, and does not
support levels of combining characters. Unicode is identical to ISO 10646, UCS-2,
Level 3, as shown in Figure 2-8
on page 2-9. Note that ASCII code 45 is not in UCS2.
Figure 2-7. Creating a Composite Character
a ´
á =
a-acute accent
Base
Character
0x00 0x61
Combining
Character
0x03 0x01
VST008.vsd