Software Internationalization Guide
Software Characteristics That Vary by Locale
Software Internationalization Guide—526225-002
2-16
Other Collation Considerations
Stroke Count
One approach to collating ideographic characters is based on the number of strokes
that make up the character. Characters containing fewer strokes sort first, followed by
characters with more strokes.
Radical Base
Ideographic characters can be collated using a scheme based on radicals, which are
the root structure of ideographs.
Phonetics
Pronunciation is another way of collating ideographic characters. A collation method
based on pronunciation is the most difficult approach for collating ideographs because
it is difficult to show how one element relates to a neighboring element.
Other Collation Considerations
When collating characters, you can define characters that are given no weight. These
characters are called “don’t-care characters.” If a hyphen is defined as a don’t-care
character, for example, the words re-creation and recreation collate to the same
position.
In n-to-one character mappings, a string of characters is treated as a single collating
element. An example is the Spanish character ch that appears between c and d when
collated. There are also one-to-n character mappings where a single collating element
is mapped to a string of characters. For example, the German character ß collates
as ss.
Numeric Representation
Date formats, time formats, and monetary figures are represented in many ways
around the world, so internationalized software must be flexible. Internationalized
software must provide a way to overcome fixed numeric representation for date, time,
and monetary formats.
Date Formats
Date formats vary among countries and cultures. A date consists of the year, the
month, and the day in a variety of orders of presentation.
Table 2-6 on page 2-17 shows how some languages represent dates.