Software Internationalization Guide
Software Characteristics That Vary by Locale
Software Internationalization Guide—526225-002
2-15
Character-Set Collation
be instances in which an uppercase character is followed by its lowercase counterpart 
instead of the next uppercase character. For example, instead of the traditional A, B, C 
…, a, b, c order, the appropriate collation scheme might be A, a, B, b, C, c, …, Z, z.
Character-Set Collation
Character-set collation schemes are based on the actual character instead of the 
encoded values, resolving some problems of character-encoded collation. With 
character-set collation, all existing and new character sets can be collated 
appropriately, independent of encoded values.
With this approach, a number of different collation orders can be defined for a single 
character set.  Character sets that are case insensitive can have collation orders in 
which the uppercase and lowercase versions of a single character have the same sort 
value. Punctuation, symbols, and word hyphenators can be defined with the rest of the 
character set.
Multilevel Collation
With multilevel collation, several collation passes are made to refine collations. 
Collation that involves case-sensitive characters and diacriticals often requires 
multilevel collation passes. In Spanish, for example, characters with the same base 
character (with or without diacriticals) are weighed equally during collation. 
Table 2-5 shows the results of a multilevel collation based on the Spanish character set 
collation scheme. In the first collation pass, characters are grouped according to the 
base character without consideration for the diacritical. The characters a and á are 
therefore weighed equally in the first pass and the words mas and más collate the 
same. Because the two words collate the same, a second collation pass is made on 
them. The second pass recognizes the diacritical above the base character a so that 
the character a sorts first, followed by the character á.
Ideographic Character Collation
Ideographic writing systems are composed of several thousand characters. Collation 
methods in ideographic writing systems are more complex than those used for 
phonetic systems, and can be based on various factors. Generally, a collation scheme 
based on a combination of stroke count, radical base, and phonetics is used.
Table 2-5. Multilevel Collation
Words to Collate Result of Collation
masacrar mas
mas más
máscara masa
más masacrar
masa máscara










