Software Internationalization Guide
Software Characteristics That Vary by Locale
Software Internationalization Guide—526225-002
2-10
Single-Byte Characters
two-byte and four-byte character data types. Some character types are not defined as
single or multiple byte, but can support various combinations.
Single-Byte Characters
A single-byte character data type consists of eight bits that represent a character.
ISO 8859 characters are single-byte characters. A single byte can represent up to 256
characters.
Multibyte Characters
A multibyte character is a coded character that uses one or more bytes in a single data
stream and that can include characters with varying widths, as shown in Figure 2-9.
Multibyte characters typically consist of characters encoded in defined code sets. For
example, a multibyte data stream can contain single-byte ASCII characters as well as
multibyte ideographic characters. In most situations, a mechanism is needed to define
the boundary between single-byte and multibyte characters—shift-in and shift-out
sequences, for example.
Multibyte characters are used for file codes, which are the external representations of
data. A file code is the format of data that is stored on disk. See File Codes and
Process Codes on page 2-11 for more information.
Wide Characters
A wide character is a fixed-width character wide enough to hold any coded character
supported by an implementation. A wide character is an object of the wchar_t type
definition, included in ISO C to enable international support.
Wide characters promote code-set independence by removing dependencies on
specific code sets or encoding methods, and replacing them with general functions that
can process any encoding. The wide character data type provides flexibility because it
can store characters defined up to the widest character in the supported code set.
All wide characters in a single data stream are the same size. The size is defined by
the implementation and is set at compile time. Wide character sizes most often used
are 1, 2, or 4 bytes—8, 16, or 32 bits. For example, if wchar_t is defined as 4 bytes,
all wide character data is processed in 4-byte groups, including all characters from
Figure 2-9. Multibyte Character Data Stream
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
. . . . . .
4-byte 2-byte 1-byte 4-byte 1-byte
VST010.vsd