Software Internationalization Guide
Glossary
Software Internationalization Guide—526225-002
Glossary-2
character set
character set. A finite set of characters (letters, digits, symbols, ideographs, or control 
functions) used for the organization, representation, or control of data. See also code 
set.
Chinese National Standard (CNS). Creates standard code sets for Traditional Chinese.
code set. Codes that map a unique numeric value to each character in a character set, 
using a designated number of bits to represent each character. Single-byte code sets 
use 7 or 8 bits to represent each character. The ASCII and ISO 646 code sets use 7 
bits to represent each character in Roman-based alphabets; these code sets are very 
limited and are not appropriate for international use. The single-byte ISO 8859 code 
sets use 8 bits to represent each character and can therefore support Roman-based 
alphabets and many others including Greek, Arabic, Hebrew, and Turkish. Multibyte 
code sets represent characters that require more than one byte, such as East Asian 
ideographic characters.
collation. The logical ordering of characters or character strings according to defined 
precedence rules. The precedence rules identify a collation sequence between the 
collating elements and additional rules that can be used to order strings consisting of 
multiple collating elements.
combining character. One or more characters that can be combined with a base character 
to form a composite character.
Common Run-time Environment (CRE). An HP product that provides a common 
environment for applications written in different programming languages.
composite character. A character consisting of a combination of two or more elements in a 
single character position. For example, the base character a and the acute accent (′) 
diacritical mark can be combined to form the composite character á.  The ISO 10646 
code set includes the encoding method enabling creation of composite characters.
context-dependent writing system. A writing system in which characters take different 
forms depending on their location within a word—for example, Arabic and Hebrew.
cultural data. Information that can vary from one language to another or between 
geographic areas. Date, time, and currency formats are examples of cultural data.
data transparent. Describes software that examines all eight bits of every data byte and 
that uses no bit in a data byte for its own purposes. Internationalized applications must 
be data-transparent.
diacritical. A mark added to a letter that usually provides information about how the letter 
should be pronounced or the stress that should be given to a syllable. Examples 
include the acute accent (′ ), the grave accent (`), the diareses (¨), and the tilde (~).
eight-bit clean. Software that processes eight-bit characters without modifying or using any 
bit in a data byte for its own purposes. Also known as data transparent or 8-bit 
transparent.










