Software Internationalization Guide
Software Characteristics That Vary by Locale
Software Internationalization Guide—526225-002
2-11
Signed Versus Unsigned Character Data Types
supported character sets of 1, 2, and 4 bytes. To store 1-byte and 2-byte characters,
the implementation pads unused space in the 4-byte-wide character with nulls.
Figure 2-10 is an example of a 4-byte-wide character data stream. The unshaded
bytes in the figure indicate null padding.
Wide characters are not used for file codes, the external representation of data.
Because wide character size varies between systems, an application might not be able
to interpret wide character file codes defined on another system. Wide character types
are thus used only for internal processing. For more information about file codes and
internal processing codes, see File Codes and Process Codes on page 2-11.
Signed Versus Unsigned Character Data Types
ASCII-based programs do not need to differentiate between signed and unsigned
characters for most purposes, but programs using larger code sets often do. Large
code sets require eight bits or more to fully represent characters, so it must be possible
to distinguish between signed and unsigned characters without using the eighth bit for
the sign. Internationalized software must use an unsigned character data type to store
characters, so that the character can never be misinterpreted as a negative binary
value.
Integer Versus Character Data Types
When an application compares integer data types to character data types it might
produce unexpected results if multiple bytes are used for both character and integer
representation. For example, consider a conditional statement that compares the
variable b to the end-of-file (EOF) character:
int b;
if ((b = getchar()) == EOF)
Because the return value of getchar() is an integer, it is important that the return
value be stored as an integer also so that the comparison will work properly.
File Codes and Process Codes
A file code is the bit pattern representing a character in a character set. It is the
external representation of data and is the format of data stored on disk.
Figure 2-10. Four-Byte Wide Character Data Stream
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
1
byte
. . . . . .
4-byte character
VST011.vsd
2-byte character 1-byte character