HP-UX 11i March 2002 Release Notes

Chapter 15
231
15 New and Changed Internationalization
Features
Unicode Character Set
HP-UX 11i provides system level support for the Unicode 2.1/ISO-10646 character set.
Hewlett-Packard’s support for Unicode provides a basis of enabling heterogeneous
interoperability for all locales.
ISO-10646 is an industry standard for defining a single encoding which uniquely
encodes all the world’s characters. Unicode 2.1 is the companion specification to
ISO-10646, Unicode support conformswith existing X/Open (OpenGroup), POSIX, ISO C
and other relevant UNIX-based standards.
HP-UX 11i supports Unicode/ISO-10646 by utilizing the UTF-8 (Universal
Transformation Format-8) representation for persistent storage. UTF-8 is an industry
recognized 8-bit multibyte format representation for Unicode. This representation allows
for successful data transmission over 8-bit networking protocols as well as for safe
storage and retrieval within a historically byte-oriented operating system such as
HP-UX.
For internal processing, HP-UX utilizes the four-octet (32-bit) canonical form specified in
ISO-10646. This support allows parity with HP-UX’s current wchar_t implementation
which has been based on a 32-bit representation.
Full systems level support is provided for all locales provided in this release.
For more information on the Unicode features of Asian System Environment, see
/usr/share/doc/ASX-UTF8.
A select subset of locale binaries have been provided for 32-bit application processing:
Table 15-1 Base
C.utf8 C UTF-8
univ.utf8 universal
Table 15-2 European
fr_CA.utf8 French Canadian
fr_FR.utf8 French
de_DE.utf8 German
it_IT.utf8 Italian
es_ES.utf8 Spanish
sv_SE.utf Swedish
Table 15-3 Asian
ja_JP.utf8 Japanese
ko_KR.utf8 Korean