C/C++ Programmer's Guide (G06.25+)
HP C Implementation-Defined Behavior
HP C/C++ Programmer’s Guide for NonStop Systems—429301-008
A-27
G.5 Common Extensions
Multibyte Characters and Wide Characters
Multibyte characters and wide characters support Asian alphabets that often contain a 
very large number of characters. The Guardian TNS C run-time library functions, 
except for the strcoll() and strxfrm() functions, support these character sets: 
Tandem Kanji, Chinese Big 5, Chinese PC, Hangul and KSC5601.
The following discussion of multibyte characters applies only to the Guardian 
environment. For details on multibyte characters in the Open System Services (OSS) 
environment, refer to the Software Internationalization Manual.
The D30 and later Guardian C run-time library functions mblen(), mbtoc(), 
mbtowcs(), wctomb(), and wctombs() do not support multibyte characters for 
programs that use the 32-bit (or wide) data model as described in this section. 
Guardian programs that use the 32-bit data model must use the Guardian system 
procedures that support multibyte characters instead. For details, refer to the Guardian 
Programmer’s Guide.
The default character set supported by a system is configured at system installation 
time and cannot be changed during program execution. The Guardian procedure 
MBCS_DEFAULTCHARSET_ returns the identifier of the default character set. The 
Guardian Procedure Calls Reference Manual describes this system procedure in 
detail.
The internal representation of the characters of these languages is HP internal and 
might not conform to any ISO standard. HP can choose to change this internal 
representation at any time.
Multibyte Characters:
•
The basic difficulty in an Asian environment is the huge number of ideograms that 
are needed for I/0, for example Chinese characters. To work within the constraints 
of usual computer architectures, these ideograms are encoded as sequences of 
bytes. The associated operating systems, application programs, and terminals 
understand these byte sequences as individual ideograms. Moreover, all of these 
encodings allow intermixing of regular single-byte C characters with the ideogram 
byte sequences.
•
The term “multibyte character” denotes a byte sequence that encodes an 
ideogram. The byte sequence contains one or more codes where each code can 
be represented in a C character data type: char, signed char, or unsigned char. All 
multibyte characters are members of the so-called extended character set. A 
regular single-byte C character is just a special case of a multibyte sequence 
where the sequence has a length of one.
Wide Characters:
•
Some of the inconvenience of handling multibyte characters is eliminated if all 
characters are of a uniform number of bytes or bits. A 16-bit integer value is used 
to represent all members because there can be thousands or tens of thousands of 
ideograms in an Asian character set.










