Software Internationalization Guide

Glossary

Software Internationalization Guide—526225-002

Glossary-2

character set

character set. A finite set of characters (letters, digits, symbols, ideographs, or control

functions) used for the organization, representation, or control of data. See also code

set.

Chinese National Standard (CNS). Creates standard code sets for Traditional Chinese.

code set. Codes that map a unique numeric value to each character in a character set,

using a designated number of bits to represent each character. Single-byte code sets

use 7 or 8 bits to represent each character. The ASCII and ISO 646 code sets use 7

bits to represent each character in Roman-based alphabets; these code sets are very

limited and are not appropriate for international use. The single-byte ISO 8859 code

sets use 8 bits to represent each character and can therefore support Roman-based

alphabets and many others including Greek, Arabic, Hebrew, and Turkish. Multibyte

code sets represent characters that require more than one byte, such as East Asian

ideographic characters.

collation. The logical ordering of characters or character strings according to defined

precedence rules. The precedence rules identify a collation sequence between the

collating elements and additional rules that can be used to order strings consisting of

multiple collating elements.

combining character. One or more characters that can be combined with a base character

to form a composite character.

Common Run-time Environment (CRE). An HP product that provides a common

environment for applications written in different programming languages.

composite character. A character consisting of a combination of two or more elements in a

single character position. For example, the base character a and the acute accent (′)

diacritical mark can be combined to form the composite character á. The ISO 10646

code set includes the encoding method enabling creation of composite characters.

context-dependent writing system. A writing system in which characters take different

forms depending on their location within a word—for example, Arabic and Hebrew.

cultural data. Information that can vary from one language to another or between

geographic areas. Date, time, and currency formats are examples of cultural data.

data transparent. Describes software that examines all eight bits of every data byte and

that uses no bit in a data byte for its own purposes. Internationalized applications must

be data-transparent.

diacritical. A mark added to a letter that usually provides information about how the letter

should be pronounced or the stress that should be given to a syllable. Examples

include the acute accent (′ ), the grave accent (`), the diareses (¨), and the tilde (~).

eight-bit clean. Software that processes eight-bit characters without modifying or using any

bit in a data byte for its own purposes. Also known as data transparent or 8-bit

transparent.