C/C++ Programmer's Guide (G06.25+)

ManualsBrandsHP ManualsServerHP NonStop G-Series

551

552

553

554

555

556

557

558

559

560

HP C Implementation-Defined Behavior

HP C/C++ Programmer’s Guide for NonStop Systems—429301-008

A-27

G.5 Common Extensions

Multibyte Characters and Wide Characters

Multibyte characters and wide characters support Asian alphabets that often contain a

very large number of characters. The Guardian TNS C run-time library functions,

except for the strcoll() and strxfrm() functions, support these character sets:

Tandem Kanji, Chinese Big 5, Chinese PC, Hangul and KSC5601.

The following discussion of multibyte characters applies only to the Guardian

environment. For details on multibyte characters in the Open System Services (OSS)

environment, refer to the Software Internationalization Manual.

The D30 and later Guardian C run-time library functions mblen(), mbtoc(),

mbtowcs(), wctomb(), and wctombs() do not support multibyte characters for

programs that use the 32-bit (or wide) data model as described in this section.

Guardian programs that use the 32-bit data model must use the Guardian system

procedures that support multibyte characters instead. For details, refer to the Guardian

Programmer’s Guide.

The default character set supported by a system is configured at system installation

time and cannot be changed during program execution. The Guardian procedure

MBCS_DEFAULTCHARSET_ returns the identifier of the default character set. The

Guardian Procedure Calls Reference Manual describes this system procedure in

detail.

The internal representation of the characters of these languages is HP internal and

might not conform to any ISO standard. HP can choose to change this internal

representation at any time.

Multibyte Characters:

•

The basic difficulty in an Asian environment is the huge number of ideograms that

are needed for I/0, for example Chinese characters. To work within the constraints

of usual computer architectures, these ideograms are encoded as sequences of

bytes. The associated operating systems, application programs, and terminals

understand these byte sequences as individual ideograms. Moreover, all of these

encodings allow intermixing of regular single-byte C characters with the ideogram

byte sequences.

•

The term “multibyte character” denotes a byte sequence that encodes an

ideogram. The byte sequence contains one or more codes where each code can

be represented in a C character data type: char, signed char, or unsigned char. All

multibyte characters are members of the so-called extended character set. A

regular single-byte C character is just a special case of a multibyte sequence

where the sequence has a length of one.

Wide Characters:

•

Some of the inconvenience of handling multibyte characters is eliminated if all

characters are of a uniform number of bytes or bits. A 16-bit integer value is used

to represent all members because there can be thousands or tens of thousands of

ideograms in an Asian character set.