NonStop SQL/MP Reference Manual

Table Of Contents
NonStop SQL/MP Reference Manual142115
C-16
Character Sets
Character Sets
NonStop SQL/MP allows you to associate one of the following character sets with a
column, literal, host variable, or parameter:
ISO 8859/1 through ISO 8859/9
Kanji
KS C5601
You can also define a collation that uses any of the nine ISO 8859 character sets and
associate the collation with a column, literal, host variable, or parameter of the same
character set. (You cannot define a collation that uses the Kanji or KS C5601 characters
sets. SQL always collates characters from those character sets according to the binary
value of the characters.)
For compatibility with versions of NonStop SQL/MP that do not support multiple
character sets, you can specify UNKNOWN to indicate that the character set is
unknown. SQL considers this equivalent to omitting the character set specification and
treats the data as 8-bit data.
ISO 8859 Character Sets
The ISO 8859 character sets are a standard set of nine single-byte character sets defined
by ISO (the International Organization for Standardization) in a series called ISO 8859.
The first in the series is called ISO 8859/1, the second is ISO 8859/2, and so on through
ISO 8859/9. In NonStop SQL/MP, you use the keywords ISO88591, ISO88592,
ISO88593, and so forth to specify a character set within the ISO 8859 series.
ISO 8859 defines printing characters for each character set, and all character sets share
the same layout. Each set includes graphic characters from the ASCII character set (a 7-
bit character set defined in both ISO and ANSI standards) in code positions %H20-
%H7E and other characters in positions %HA0-%HFF, allowing 96 graphic characters
to be added to those already in ASCII. Graphic characters that appear in multiple ISO
8859 character sets always have the same encoding.
The ranges %H00-%H1F and %H7F-%H9F are reserved for control characters, but ISO
8859 does not make specific control character assignments.
ISO 8859/1, which is informally called Latin-1, is the most commonly used ISO 8859
character set. ISO 8859/1 contains the characters necessary for Western European
languages such as French, German, Italian, and Spanish. It is Tandem's current default
character set and is implemented in the latest version of 6525A terminal, PCT, and
printers.
The other ISO 8859 character sets are used in varying degrees throughout the world.
ISO 8859/2 is used for Eastern European languages, ISO 8859/3 for Southeastern
European languages, ISO 8859/4 for Northern European languages, ISO 8859/5 for
English and Cyrillic languages, ISO 8859/6 for English and Arabic languages, ISO
8859/7 for English and Greek languages, ISO 8859/8 for English and Hebrew
languages, and ISO 8859/9 for Western European and Turkish languages.