Software Internationalization Guide

Glossary
Software Internationalization Guide526225-002
Glossary-2
character set
character set. A finite set of characters (letters, digits, symbols, ideographs, or control
functions) used for the organization, representation, or control of data. See also code
set.
Chinese National Standard (CNS). Creates standard code sets for Traditional Chinese.
code set. Codes that map a unique numeric value to each character in a character set,
using a designated number of bits to represent each character. Single-byte code sets
use 7 or 8 bits to represent each character. The ASCII and ISO 646 code sets use 7
bits to represent each character in Roman-based alphabets; these code sets are very
limited and are not appropriate for international use. The single-byte ISO 8859 code
sets use 8 bits to represent each character and can therefore support Roman-based
alphabets and many others including Greek, Arabic, Hebrew, and Turkish. Multibyte
code sets represent characters that require more than one byte, such as East Asian
ideographic characters.
collation. The logical ordering of characters or character strings according to defined
precedence rules. The precedence rules identify a collation sequence between the
collating elements and additional rules that can be used to order strings consisting of
multiple collating elements.
combining character. One or more characters that can be combined with a base character
to form a composite character.
Common Run-time Environment (CRE). An HP product that provides a common
environment for applications written in different programming languages.
composite character. A character consisting of a combination of two or more elements in a
single character position. For example, the base character a and the acute accent (′)
diacritical mark can be combined to form the composite character á. The ISO 10646
code set includes the encoding method enabling creation of composite characters.
context-dependent writing system. A writing system in which characters take different
forms depending on their location within a word—for example, Arabic and Hebrew.
cultural data. Information that can vary from one language to another or between
geographic areas. Date, time, and currency formats are examples of cultural data.
data transparent. Describes software that examines all eight bits of every data byte and
that uses no bit in a data byte for its own purposes. Internationalized applications must
be data-transparent.
diacritical. A mark added to a letter that usually provides information about how the letter
should be pronounced or the stress that should be given to a syllable. Examples
include the acute accent ( ), the grave accent (`), the diareses (¨), and the tilde (~).
eight-bit clean. Software that processes eight-bit characters without modifying or using any
bit in a data byte for its own purposes. Also known as data transparent or 8-bit
transparent.