Software Internationalization Guide

Software Internationalization Guide526225-002
Glossary-1
Glossary
ANSI. The American National Standards Institute.
Arabic-based writing system. A writing system with letters that are derived from the Arabic
alphabet. Not all languages that use Arabic characters are related linguistically to
Arabic.
ASCII. TAmerican Standard Code for Information Interchange. A single-byte code set that
uses only 7 of the 8 bits in a byte to represent each character. The ASCII code set
contains the uppercase and lowercase characters of the U.S. English alphabet, some
punctuation symbols, the digits 0 through 9, and some symbols and control characters.
Because of its limited characters, and because the 8th bit is sometimes used in ASCII
programs as a utility bit, the ASCII code set is not appropriate for use in international
software.
base character. A character that can be combined with one or more combining characters
to form a composite character.
Basic Multilingual Plane (BMP). The lower two octets, row and cell, of the ISO 10646
character layout. Also known as Universal Coded Character Set - 2 (UCS-2).
block-based writing system. A writing system composed of single letters that stand alone
in printed text such as English, French, and Russian.
BMP. See Basic Multilingual Plane (BMP).
byte. An ordered set of bits that represents a character or a part of a character. The number
of bits per byte is implementation-dependent; a byte usually contains 8 or more bits.
Also called an octet.
C locale. A special locale defined by the ANSI C standard. Every standard C program
always starts up in the C locale, which means that no locale-specific action takes
place, and the program operates in the ASCII mode. All library functions behave as
they do in standard C. Unless the program calls the setlocale() function, none of
the behavior changes. Also called the POSIX locale or the C/POSIX locale.
character. A sequence of one or more bytes representing a single character; used for the
organization, representation, or control of data. A single-byte character consists of
eight bits that represent a character. A multibyte character uses one or more bytes to
represent a character. A wide character is a fixed-width character wide enough to hold
any coded character supported by an implementation.
character class. A named set of characters sharing an attribute associated with the name
of the class.
character encoding. A method in which each member of a character set is mapped to
specific numeric code values.