Software Internationalization Guide Abstract This guide gives an overview of internationalization standards and processes. It discusses the goals of internationalization and localization, software characteristics that vary between locales, industry standards that govern internationalization, and the internationalization facilities of the HP NonStop™ Kernel Open System Services (OSS) environment. Product Version N.A. Supported Release Version Updates (RVUs) This guide supports G06.
Document History Part Number Product Version Published 526225-002 N.A. June 2004 526225-001 N.A. September 2003 116809 N.A. November 1995 096174 N.A.
Software Internationalization Guide Glossary Index What’s New in This Guide vii Guide Information vii New and Changed Information Figures Tables vii About This Guide ix How to Use This Guide ix About the Contents of This Guide Related Reading x Notation Conventions x ix 1.
Contents 3.
Contents 3. POSIX and XPG Internationalization Model (continued) 3. POSIX and XPG Internationalization Model (continued) Messaging System (continued) Accessing Message Catalogs 3-10 Messaging System Example 3-11 Code-Set Conversion 3-11 Creating Source-Code Files of Code-Set Conversion Tables Generating Code-Set Conversion Tables 3-12 Algorithmic Code-Set Converters 3-13 Accessing Code-Set Conversion-Tables 3-13 Code-Set Conversion Example 3-14 3-12 4.
Contents A. Software Supporting Multiple Character Sets A. Software Supporting Multiple Character Sets Glossary Index Figures Figure 1-1. Figure 1-2. Figure 1-3. Figure 1-4. Figure 2-1. Figure 2-2. Figure 2-3. Figure 2-4. Figure 2-5. Figure 2-6. Figure 2-7. Figure 2-8. Figure 2-9. Figure 2-10. Figure 3-1. Figure 3-2. Figure 3-3. Figure 4-1.
Tables (continued) Contents Tables (continued) Table A-2.
Tables (continued) Contents Software Internationalization Guide —526225-002 vi
What’s New in This Guide Guide Information Software Internationalization Guide Abstract This guide gives an overview of internationalization standards and processes. It discusses the goals of internationalization and localization, software characteristics that vary between locales, industry standards that govern internationalization, and the internationalization facilities of the HP NonStop™ Kernel Open System Services (OSS) environment. Product Version N.A.
New and Changed Information What’s New in This Guide Software Internationalization Guide —526225-002 viii
About This Guide This guide provides an overview of internationalization. It includes the following topics: • • • • • The goals of internationalization and localization The characteristics that vary between locales The industry standards governing internationalization The internationalization utilities The HP internationalization subsystem How to Use This Guide This guide introduces software internationalization, sometimes referred to as “I18N” or “i18n”.
Related Reading About This Guide Related Reading You can find additional information in these HP manuals: • • • • • Open System Services Library Calls Reference Manual Open System Services Shell and Utilities Reference Manual Common Run-Time Environment (CRE) Programmer’s Guide Guardian Procedure Calls Reference Manual C/C++ Programmer's Guide Other Publications These publications are suggested reading for more information about internationalization: • • • • • Global Software, by Dave Taylor, publish
General Syntax Notation About This Guide General Syntax Notation This list summarizes the notation conventions for syntax presentation in this manual. UPPERCASE LETTERS. Uppercase letters indicate keywords and reserved words. Type these items exactly as shown. Items not enclosed in brackets are required. For example: MAXATTACH lowercase italic letters. Lowercase italic letters indicate variable items that you supply. Items not enclosed in brackets are required. For example: file-name computer type.
General Syntax Notation About This Guide | Vertical Line. A vertical line separates alternatives in a horizontal list that is enclosed in brackets or braces. For example: INSPECT { OFF | ON | SAVEABEND } … Ellipsis. An ellipsis immediately following a pair of brackets or braces indicates that you can repeat the enclosed sequence of syntax items any number of times.
Change Bar Notation About This Guide !i,o. In procedure calls, the !i,o notation follows an input/output parameter (one that both passes data to the called procedure and returns data to the calling program). For example: error := COMPRESSEDIT ( filenum ) ; !i:i. !i,o In procedure calls, the !i:i notation follows an input string parameter that has a corresponding parameter specifying the length of the string in bytes.
Change Bar Notation About This Guide Software Internationalization Guide —526225-002 xiv
1 Understanding Internationalization Concepts The opening of global markets and the increasing use of computers throughout the world make it essential that companies meet the needs of global business as costeffectively as possible. Internationalizing computer applications is a key part of the HP GeoReady strategy to address global business requirements.
Languages, Cultures, and Code Sets Understanding Internationalization Concepts Languages, Cultures, and Code Sets These examples describe a few differences among languages, cultures, and code sets that are encountered in international applications: • Symbols and rules that apply to the development language are often not appropriate to a language in which the product might be used.
What Is Localization? Understanding Internationalization Concepts What Is Localization? Localization is the process of adapting an application to the accepted way of presenting information in a particular culture. For an internationalized program to support a variety of languages, cultures, and code sets, the data of the source country must be transformed into data that is appropriate for the target cultures.
Understanding Internationalization Concepts What Is a Locale? religious references, humor, and many more culturally dependent aspects must be adapted for each locale. For example, mailboxes around the world have a great variety of shapes, so an icon depicting an American rural mailbox is unlikely to be understood by most of the world's inhabitants, What Is a Locale? A locale is the part of a user’s environment that defines the user’s language and cultural preferences or conventions.
Internationalization Standards Understanding Internationalization Concepts Figure 1-3. Separating Program Source Code From Culturally Sensitive Data Program Source Code Culturally Sensitive Data data types, variables, data structures, algorithms, ... messages, date formats, time formats, currency formats, collation scheme, ... Internationalized Software Product VST003.vsd Internationalization Standards Many internationalization standards are currently being defined.
Introduction to the HP Internationalization Subsystem Understanding Internationalization Concepts Figure 1-4. Components of an Internationalized Application Internationalized Software Product Locale Initialization Locale-Independent Code Messaging System Code-Set Conversion VST004.vsd See Section 4, The HP Internationalization Subsystem, for more information.
2 Software Characteristics That Vary by Locale The primary goal of internationalization is to develop software that meets the needs of all languages and cultures. This section describes a few of the world’s writing systems, and discusses issues that must be considered for internationalization. Writing Systems Writing systems differ in the symbols they use, the direction in which those symbols are written and read, and the manner in which they are grouped.
Software Characteristics That Vary by Locale Arabic Arabic Arabic is written from right to left, using a cursive writing style in which characters often attach to characters that precede and follow them. In Arabic, characters can have different forms at different locations within a word. For example, a character at the beginning of a word has one form, but uses a different form when embedded in the center of the word and still another form when it appears at the end of a word.
East Asian Software Characteristics That Vary by Locale Figure 2-3. Chinese Ideograph of a Horse Each ideographic writing system has a name. Hanzi identifies the Chinese ideographic writing systems (Traditional and Simplified Chinese); Kanji is the Japanese ideographic writing system; Hanja is the Korean ideographic writing system. Together, the Chinese, Japanese, and Korean ideographs are referred to as the Han, shown in Figure 2-4. Figure 2-4.
Software Characteristics That Vary by Locale Character Sets Japanese Japanese ideographs are known today as Kanji. Although similar concepts can be represented in Chinese and Japanese, the two languages are linguistically different. In addition to Kanji, the Japanese language has two phonetic systems, Katakana and Hiragana, each consisting of about 50 characters.
Code Sets Software Characteristics That Vary by Locale Code Sets A code set assigns a unique numeric value to each character in a character set, with a designated number of bits representing each character. In the past, hardware limitations restricted the number of bits used to represent characters, but with fewer hardware restrictions software can now support code sets of up to hundreds of thousands of characters.
Software Characteristics That Vary by Locale Multibyte Code Sets Table 2-2.
Multibyte Code Sets Software Characteristics That Vary by Locale Table 2-3. East Asian Code Sets (page 2 of 2) Code Set Name Languages Supported JIS X0201 Japanese JIS X0208 Japanese JIS X0212 Japanese KS C 5601-1987 Korean The ISO 10646 Universal Coded Character Set ISO 10646 is a universal coded character set (UCS) that represents all characters and symbols from all commonly used scripts and languages.
Unicode Software Characteristics That Vary by Locale combining character, using the base character a followed by the acute accent. As Figure 2-7 shows, the resulting code value uses four octets—two for the a (0x00 0x61) and two for the acute accent (0x03 0x01). Figure 2-7. Creating a Composite Character a-acute accent á= a ´ Base Character 0x00 0x61 Combining Character 0x03 0x01 VST008.
Encoding Methods Software Characteristics That Vary by Locale Figure 2-8. Relationship Between Code Sets ISO 10646 (UCS-2)/Unicode ISO 8859-1 ASCII VST009.vsd Encoding Methods Encoding methods provide a way to mix characters from different code sets so that users can use characters from multiple languages. For example, the Extended UNIX Codes (EUC) and personal computer (PC) codes mix ASCII, local phonetic, and ideographic characters in one data stream.
Single-Byte Characters Software Characteristics That Vary by Locale two-byte and four-byte character data types. Some character types are not defined as single or multiple byte, but can support various combinations. Single-Byte Characters A single-byte character data type consists of eight bits that represent a character. ISO 8859 characters are single-byte characters. A single byte can represent up to 256 characters.
Signed Versus Unsigned Character Data Types Software Characteristics That Vary by Locale supported character sets of 1, 2, and 4 bytes. To store 1-byte and 2-byte characters, the implementation pads unused space in the 4-byte-wide character with nulls. Figure 2-10 is an example of a 4-byte-wide character data stream. The unshaded bytes in the figure indicate null padding. Figure 2-10. Four-Byte Wide Character Data Stream ...
Software Characteristics That Vary by Locale Data Transparency A process code is an internal representation of data and is the format in which programs process data. With process codes all characters are represented internally by the same number of bits so that processing is independent of the character set. Process code is never saved on disk or exchanged with other running processes.
Software Characteristics That Vary by Locale Character Classification Character Classification Character classification is the grouping of characters into named classes that share an attribute associated with the name of the class. For example, ASCII character classes are uppercase, lowercase, alphabet, digit, and punctuation. It is easiest to determine how to process characters if the classification of a character is defined.
Collation Software Characteristics That Vary by Locale Class-conversion routines might be written, for example, for cases in which each letter has only one uppercase version and one lowercase version. In French, however, lowercase letters may lose their diacriticals when converted to uppercase—e, è, é, and ê may all convert to E. To meet international needs, locales give users the option of defining uppercase and lowercase mappings so that diacriticals are not lost.
Software Characteristics That Vary by Locale Character-Set Collation be instances in which an uppercase character is followed by its lowercase counterpart instead of the next uppercase character. For example, instead of the traditional A, B, C …, a, b, c order, the appropriate collation scheme might be A, a, B, b, C, c, …, Z, z. Character-Set Collation Character-set collation schemes are based on the actual character instead of the encoded values, resolving some problems of character-encoded collation.
Software Characteristics That Vary by Locale Other Collation Considerations Stroke Count One approach to collating ideographic characters is based on the number of strokes that make up the character. Characters containing fewer strokes sort first, followed by characters with more strokes. Radical Base Ideographic characters can be collated using a scheme based on radicals, which are the root structure of ideographs. Phonetics Pronunciation is another way of collating ideographic characters.
Software Characteristics That Vary by Locale Time Formats Table 2-6. Date Formats by Language mm/dd/yy US English dd/mm/yy Australian English British English Canadian French Danish Spanish dd-mm-yy Dutch Flemish Italian Norwegian dd.mm.yy Finnish French German Swiss French Swiss German yy/mm/dd Portuguese yy-mm-dd Swedish Some countries capitalize month and day names; others do not.
Software Characteristics That Vary by Locale Numeric and Monetary Formats Canadian French, Danish, Dutch, Flemish, German, Italian, Portuguese, and Swiss French commonly use the 24-hour clock format. Table 2-8 shows written time formats used by different countries. Table 2-8. Time Formats by Country Country Format France 16h10 Germany 16.10 Japan 16:10 United States 4:10 p.m. To complicate time formats further, the world is divided into 24 time zones, each with its own name.
Software Characteristics That Vary by Locale Numeric and Monetary Formats Table 2-10. Monetary Formats by Country Country Currency Monetary Format Japan yen ¥1,234 Norway krona kr1.234$56 Portugal escudos 1.234$56 Switzerland Swiss francs 1.234$56SFrs United States dollars $1,234.
Software Characteristics That Vary by Locale Numeric and Monetary Formats Software Internationalization Guide —526225-002 2- 20
3 POSIX and XPG Internationalization Model This section gives a general overview of the POSIX and XPG internationalization model. The scope of the POSIX and XPG standards is much greater than the information covered in this publication—for details on these standards, see the POSIX and XPG guidelines.
POSIX and XPG Internationalization Model POSIX and XPG Internationalization Model development bodies, Uniforum’s technical committee on internationalization provides specifications for POSIX and XPG standards. POSIX and XPG Internationalization Model POSIX and XPG standards provide a model for developing internationalized software in which a user specifies a locale for the operating environment.
POSIX and XPG Internationalization Model Accessing Locale Objects for example, collation procedures and date and time formats. Locale-specific information is isolated from a program’s source code and stored in separate locale source files. Isolating locale-specific information in separate locale source files simplifies localizing a program. When an internationalized program needs to support a new locale, only the locale source file is localized.
POSIX and XPG Internationalization Model Locale Variable Precedence enables the user’s environment to display time formats according to German time conventions, after setlocale() has successfully set the German locale. See Setting the Program Environment on page 3-6 for more information.
POSIX and XPG Internationalization Model Setting the User Locale Environment 3. LANG takes precedence when neither LC_ALL nor the LC variables are defined. 4. If LC_ALL, the LC variables, and LANG are all undefined, the default locale (the C/POSIX locale) is used. In the following example, all aspects of the French locale (fr_FR) are supported except collation, which is based on German (de_DE) conventions: LANG=fr_FR.ISO8859-1 LC_COLLATE=de_DE.
POSIX and XPG Internationalization Model Setting the Program Environment $date 15 marzo 1994, 09:45 $export LC_ALL=fr_FR.ISO8859-1 $date 15 mars 1994, 09h45 Setting the Program Environment An application program can inherit the user’s current locale or take on an independent locale, depending on the setting of setlocale(). Internationalized programs cannot function as they were designed to function until setlocale() is successfully called, establishing the locale.
POSIX and XPG Internationalization Model Messaging System This example uses hard-coded values to check that a character is within the boundaries of the ASCII code set: /* Hard code ASCII range */ main() { int input; input = getchar(); if ((input >= 65 && input <= 90) || (input >= 97 && input <= 122)) process_valid_char(input); else process_invalid_char(input); } This example uses the internationalized function isalpha() to verify that a character is within the boundaries of a code set defined by the curre
POSIX and XPG Internationalization Model Creating Message Source Files symbolic identifiers into numeric constants, then produces a set of commands suitable for passing to the gencat utility. gencat creates and modifies a message catalog from a message text source file. runcat runs mkcatdefs and sends its output to gencat. Two other messaging system utilities are available. The dspcat utility displays all or part of a message catalog; the dspmsg utility writes a selected message to standard output.
POSIX and XPG Internationalization Model Generating Message Catalogs Example: Message Source File From an Internationalized Program This is an example of a message source file in US English that contains program messages in a format defined by XPG. This example is based on the program used in the previous example: $ English message source file $set 1 main module 1 "Main Menu" 2 "1 - Add Record" 3 "2 - Delete Record" 4 "3 - Modify Record" 5 "4 - Quit" 6 "Make A Selection:" 7 "You pressed an invalid key.
Accessing Message Catalogs POSIX and XPG Internationalization Model Figure 3-2. Generating Message Catalogs Message Source File Message Source File English Message Source File (Numeric Constants) English Message Source File (Symbolic Identifiers) Localization Process Spanish Message Source FIle Localize English to Spanish mkcatdefs gencat Message Catalog gencat Message Catalog Spanish Message Catalog English Message Catalog gencat Message Catalog English Message Catalog VST013.
POSIX and XPG Internationalization Model Messaging System Example NLSPATH identifies the search path to the appropriate directory for finding the message catalog; it also identifies message catalog naming conventions. Messaging System Example The following example shows code modified to incorporate the XPG messaging system. It contains calls that access messages in a message catalog.
POSIX and XPG Internationalization Model Creating Source-Code Files of Code-Set Conversion Tables Creating Source-Code Files of Code-Set Conversion Tables A source-code file of code-set conversion tables contains the original code set values and their corresponding target code set values.
Algorithmic Code-Set Converters POSIX and XPG Internationalization Model Figure 3-3. Generating Code-Set Conversion Table Source Code File of Code-Set Conversion Tables genxlt Code-Set Conversion Table VST014.vsd Algorithmic Code-Set Converters Most multibyte code sets, such as the large Asian code sets, cannot use tables for conversion and therefore require algorithmic code-set converters. The naming convention for code-set converters is the same as for conversion tables.
POSIX and XPG Internationalization Model Code-Set Conversion Example Code-Set Conversion Example This is an example of code-set conversion: /* Example of code-set conversion*/ #include #include
4 The HP Internationalization Subsystem This section describes the HP internationalization subsystem, lists supported locales, gives design and development guidelines for internationalized software, and offers tips for testing and troubleshooting internationalized software. About the HP Internationalization Subsystem Internationalizing software is a key part of the HP GeoReady strategy to address global business requirements.
The HP Internationalization Subsystem Supported Code-Set Converters for the TNS/R Native Environment Single-byte Locales These locales are included in T8372 : da_DK.ISO8859-1 de_CH.ISO8859-1 de_DE.ISO8859-1 el_GR.ISO8859-7 en_GB.ISO8859-1 en_JP.ISO8859-1 en_US.ISO8859-1 es_ES.ISO8859-1 fi_FI.ISO8859-1 fr_BE.ISO8859-1 fr_CA.ISO8859-1 fr_CH.ISO8859-1 fr_FR.ISO8859-1 is_IS.ISO8859-1 it_IT.ISO8859-1 nl_BE.ISO8859-1 nl_NL.ISO8859-1 no_NO.ISO8859-1 pt_PT.ISO8859-1 sv_SE.ISO8859-1 tr_TR.
The HP Internationalization Subsystem FSS-UTF FSS-UTF FSS-UTF FSS-UTF FSS-UTF FSS-UTF FSS-UTF FSS-UTF FSS-UTF FSS-UTF FSS-UTF ISO8859-1 ISO8859-2 ISO8859-3 ISO8859-4 ISO8859-5 ISO8859-6 ISO8859-7 ISO8859-8 ISO8859-9 SJIS SJIS UCS-2 UCS-2 UCS-2 UCS-2 UCS-2 UCS-2 UCS-2 UCS-2 UCS-2 UCS-2 eucJP eucJP eucKR eucKR eucTW eucTW ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ ↔ Supported Code-Set Converters for the TNS/R Native Environment ISO8859-4 ISO8859-5 ISO8859-6 ISO8859-7 ISO8859
Supported Code-Set Converters for the TNS Environment The HP Internationalization Subsystem ISO8859-9 UCS-2 UCS-2 UCS-2 UCS-2 ↔ ↔ ↔ ↔ ↔ FSS-UTF SJIS eucJP eucKR eucTW Supported Code-Set Converters for the TNS Environment The HP internationalization subsystem supports these code-set converters in the TNS environment: ISO8859-1 ISO8859-7 ISO8859-9 ISO8859-1_GL ISO8859-1_GR SJIS ↔ ↔ ↔ ↔ ↔ ↔ Ι BM-850 IBM-869 IBM-857 IBM-850 IBM-850 AJEC Design and Development Guidelines Software developers can minimize
The HP Internationalization Subsystem Preparing Source Code Data Transparency Make sure your program source code is data transparent. In data transparent code, no data bits is used to represent a non-data character. Data transparency is a fundamental requirement for internationalized applications because internationalized software must be able to support numerous single-byte or multibyte code sets.
Locale-Sensitive Functions The HP Internationalization Subsystem The application takes the localized “Today is” phrase from the message catalog, and strftime() provides the day. If the locale is English, the preceding example displays “Today is Thursday”; the Italian locale results in “Oggi è giovedì”; the Spanish locale results in “Hoy es jueves.” If no localized message is available, the application uses the fourth parameter of the catgets() function as the default message.
Locale-Sensitive Functions The HP Internationalization Subsystem Table 4-1.
Locale-Sensitive Functions The HP Internationalization Subsystem Table 4-1.
The HP Internationalization Subsystem Locales in OSS Client/Server Applications Locales in OSS Client/Server Applications For homogeneous client/server applications in the OSS environment, HP provides the setlocale_from_msg() function to enable a server to receive a client's locale information along with messages from $RECEIVE. If a client that is not internationalized communicates with an internationalized server, the only locale available is the C/POSIX default locale.
Internationalization Functions and COBOL85 The HP Internationalization Subsystem READUPDATE(rf_num, r_buf, &read_cnt); /* Retrieve the message tag */ FILE_GETRECEIVEINFO_(receive_info); /* Change locale based on the received message's locale */ if (setlocale_from_msg(receive_info[2]) != NULL) { /* Perform operations in the locale received from msg */ } /* Restore the server's locale */ server_lc = setlocale(LC_ALL, server_lc); } Internationalization Functions and COBOL85 C functions can be called from CO
The HP Internationalization Subsystem Internationalization Functions and Guardian Multibyte Character-Set Procedures of a SORT or MERGE statement, and the CODE-SET clause of a file description entry to provide control over behaviors similar to those provided through OSS locales. The CURRENCY SIGN and DECIMAL POINT clauses of the SPECIAL-NAMES paragraph allow formatting behaviors similar to those provided through OSS locales. Date-Time Formats COBOL date and time formats are not affected by locale.
The HP Internationalization Subsystem Internationalization Functions and HP NonStop SQL/MX A default Guardian character set can be established, separate from that of the OSS locale in use; the Guardian default character set can be determined by calling the MBCS_DEFAULTCHARSET_ procedure. Collating sequences for Guardian procedure calls are not affected by OSS locale or by the use of the Guardian multibyte character-set procedures.
The HP Internationalization Subsystem Internationalization Functions and HP NonStop SQL/MP SQL functions and predicates; however, you cannot use KANJI or KSC5601 character sets for character columns of SQL/MX tables. When you install SQL/MX, you can set the national character set. The national character set is associated with NCHAR and NATIONAL CHARACTER data types and with N string literals. If you do not specify a national character set, the default is UCS2.
The HP Internationalization Subsystem Compiling Internationalized Applications character set or collating sequence associated with a column after the column is created. You can specify a different collating sequence for a sort or comparison that involves single-byte character values within the same character set, but you cannot automatically vary that collating sequence based on locale. Double-byte Character Sets SQL/MP supports two double-byte character sets—HP Kanji (KANJI) and HP Korean (KSC5601).
The HP Internationalization Subsystem Testing Internationalized Applications To compile an internationalized application in the TNS/R native environment, run the native c89 utility using this command: c89 options myapp.
The HP Internationalization Subsystem Basic Testing Checklist Basic Testing Checklist Use this checklist to verify that the application complies with the basic code preparation requirements for internationalization: • • • • Is the program source code data transparent? Are all hard-coded messages removed from the source code and stored in a separate file or files? Is all culturally dependent information removed from the source code? Examples of culturally-dependent information include date, time, and mon
The HP Internationalization Subsystem Testing the Application’s Use of Locales Testing the Application’s Use of Locales You can switch locale environment variables to verify that an application's behavior is consistent with the selected locale and that it accesses the appropriate message catalog. Test all supported locales to verify that the locale-sensitive aspects of the application are locale-independent, and that the proper set of internationalized functions has been used in development.
Identifying Problems in the Application The HP Internationalization Subsystem internationalized application are interconnected. In internationalized applications the environment variables might affect the locales, which in turn affect character processing and the messaging system. As Figure 4-1 on page 4-18 shows, the application inherits the internationalization environment variables. After the application sets the locale it calls the internationalization functions, which access the locale data.
The HP Internationalization Subsystem • Identifying Problems in the Application Are the code-set conversion function variables defined correctly? For example, if the LOCPATH environment variable is incorrectly defined the code-set converter or code-set conversion table object cannot be opened.
The HP Internationalization Subsystem Identifying Problems in the Application Software Internationalization Guide —526225-002 4- 20
A Software Supporting Multiple Character Sets The HP products and commonly used third-party software listed in Table A-1 and Table A-2 on page A-3 have been evaluated by test labs local to the country where the character set is used. A Y indicates that the character set is supported while an N indicates that the character set is not supported. A blank indicates that support has not been tested by a local test lab.
Software Supporting Multiple Character Sets Table A-1. Software Tested For Support of Unicode and Chinese Two-Byte Character Sets (page 2 of 2) Software Unicode UCS-2 UTF-16 Chinese Big 5 GB 2312 Spooler Y Y NonStop SQL/MP Y Y NonStop SQL/MX Release 1.n and later N SequeLink version 5.3 and later NonStop SQL/MX Release 2.
Software Supporting Multiple Character Sets Table A-2. Software Tested For Support of Japanese and Korean Character Sets (page 1 of 2) Japanese Software Shift JIS Kanji (data type) Korean KS C 56011987 ISO 8859-1, when used by default HP Enterprise Toolkit -- NonStop Edition (ETK) version 1.n and later Y NonStop Server for Java version 3.1 and later Y Y Y NonStop JDBC server Y Y Y NonStop JDBC/MX version 2.0 and later Y Y Y NonStop Server for Java Message Service (JMS) version 2.
Software Supporting Multiple Character Sets Table A-2. Software Tested For Support of Japanese and Korean Character Sets (page 2 of 2) Japanese Software NonStop TS/MP Shift JIS Kanji (data type) Y NonStop TUXEDO Korean KS C 56011987 ISO 8859-1, when used by default Y Y Visual Inspect N Extensible Markup Language (XML) parser version 3.0 and later Y NonStop XSLT version 1.
Glossary ANSI. The American National Standards Institute. Arabic-based writing system. A writing system with letters that are derived from the Arabic alphabet. Not all languages that use Arabic characters are related linguistically to Arabic. ASCII. TAmerican Standard Code for Information Interchange. A single-byte code set that uses only 7 of the 8 bits in a byte to represent each character. The ASCII code set contains the uppercase and lowercase characters of the U.S.
character set Glossary character set. A finite set of characters (letters, digits, symbols, ideographs, or control functions) used for the organization, representation, or control of data. See also code set. Chinese National Standard (CNS). Creates standard code sets for Traditional Chinese. code set. Codes that map a unique numeric value to each character in a character set, using a designated number of bits to represent each character. Single-byte code sets use 7 or 8 bits to represent each character.
encoding method Glossary encoding method. A set of rules for combining two or more code sets into a single data stream. environment variable. A variable that is associated with a specific area of a user’s environment. For example, the LC_TIME locale environment variable enables the display of time formats according to local time conventions. EUC. The Extended Unix Codes. EUC is an encoding method most commonly used on Asian UNIX-based systems.
ideograph Glossary ideograph. A character or symbol representing a word or idea. Some writing systems, such as Japanese and Chinese, use thousands of ideographs. IEEE. Institute of Electrical and Electronics Engineers. IEEE is a professional organization whose committees develop and propose computer standards that define the physical and data link protocols of entities such as communication networks. IEEE formed the POSIX standard. internationalization.
language Glossary language. System of communication made up of words formed by combinations of patterns and symbols and can vary depending on the people of a particular country or by groups with a shared set of history or tradition. Latin-based writing system. A writing system with letters that are derived from the Latin alphabet. Not all languages that use Latin characters are related linguistically to Latin. locale. (1)The subset of a user's environment that depends on language and cultural conventions.
PC Glossary PC. Personal Computer codes. PC codes are an encoding standard that is popular on East Asian personal computers. POSIX. The Portable Operating System Interface, as defined by the Institute of Electrical and Electronics Engineers (IEEE) and the American National Standards Institute (ANSI). Each POSIX interface is separately defined in a numbered ANSI/IEEE standard or draft standard. The application program interface (API), known as POSIX.1, has become ISO/IEC IS 9945-1:1990. POSIX locale.
Index A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|R|S|T|U|V|W|X Special Characters A Accent marks 2-1 Algorithmic code-set converters 3-13, 4-2 Applications using Guardian procedure calls 4-11 Arabic-based writing systems 2-2 ASCII 2-5 B Base characters 2-7 Basic Multilingual Plane (BMP) 2-7 Big 5 2-9 Big 5 character set A-1 Block-based writing systems 2-1 C c89 utility 4-1, 4-14 catclose() 3-10 catgets() 3-10 catopen() 3-10 Character 2-13 classification 2-13 combining 2-7 composite 2-7 data types 2-9, 2-11 encoding
Index Code-set conversion (continued) converters for the TNS/R native environment 4-2 generating conversion tables 3-12 table-driven converters 4-3 Collation character-encoded 2-14 character-set 2-15 don’t-care characters 2-16 ideographic character 2-15 multilevel 2-15 Combining characters 2-7 Compiling internationalized applications in the TNS environment 4-15 in the TNS/R native environment 4-14 Composite characters 2-7 Context-dependent writing systems 2-2 Conversion, code-set 3-11 accessing conversion
Index H Han 2-3 Hangul 2-4 Hanja 2-3 Hanzi 2-3 Hiragana 2-4 HP internationalization subsystem 1-5 Internationalization (continued) utilities (continued) genxlt 3-12 mkcatdefs 3-7, 3-9 runcat 3-8, 3-9 ISO 3-1 ISO 8859-1 character set 2-5, A-3 J I iconv() 3-13 iconv_close() 3-13 iconv_open() 3-13 Ideographic character collation 2-15 Ideographs 2-2 IEEE 1-5 Integer data types 2-11 International Organization for Standardization (ISO) 3-1 Internationalization 1-1 and character sets 2-4 and code sets 1-2, 2-5
Index Locale (continued) in client/server applications 4-9 locale-sensitive functions 4-6 multibyte 4-1, 4-2 name 3-3 objects 3-3 single-byte 4-1, 4-2 supported locales 4-1, 4-2 testing 4-17 Locale environment variables LANG 3-4 LC_ALL 3-4 LC_COLLATE 3-4 LC_CTYPE 3-4 LC_MESSAGES 3-4 LC_MONETARY 3-4 LC_NUMERIC 3-4 LC_TIME 3-4 precedence rules 3-4 program environment 3-6 user environment 3-5 Locale-independent code 3-6 Locale-sensitive functions 4-6 Localization 1-3 LOCPATH 3-13 M Message catalog accessing
Index R Radix character 2-18 runcat utility 3-7, 3-9 S setlocale() 3-6 setlocale_from_msg() 4-9 Shift JIS character set 2-9, A-3 Shift-JIS 2-9, 3-13 Signed character data types 2-11 Simplified Chinese 2-3 Single-byte characters 2-10 code sets 2-5 locales 4-1, 4-2 SJIS 2-9, 3-13 SJIS_AJEC 3-13 Source code data transparency 4-4 isolating messages from 4-5 preparing for internationalization 4-4 removing character encoding assumptions from 4-6 removing culturally dependent information from 4-5 SQL/MP applicat
Index W wchar_t 2-10 Wide characters 2-10, 2-11, 2-12 Writing systems 2-1 Arabic-based 2-2 context-dependent 2-2 cursive 2-2 East Asian 2-2 ideographic 2-2 Latin-based 2-1 X READUPDATE 4-9 XPG 1-5, 3-1 XPG internationalization model 3-2 XPG4 4-1 XPG4 standards 4-4 X/Open 1-5, 3-1 Special Characters $RECEIVE 4-9 Software Internationalization Guide —526225-002 Index -6