localedef.4 (2010 09)

l
localedef(4) localedef(4)
integer lists
Integer list operands consists of one or more decimal digits separated by semi-
colons.
shift Shift
operands follow keywords toupper and tolower, and must consist of two
character-code constants enclosed by left and right parentheses and separated by a
comma. Each such character pair is separated from the next by a semicolon. For
tolower, the first constant represents an uppercase character and the second the
corresponding lowercase character. For toupper
, the first constant represents an
lowercase character and the second the corresponding uppercase character.
collating element entry
The order_start keyword is followed by collating element entries, one per line, in
ascending order by collating position. The collating element entries have the form:
collation_element[weight[
;weight]]
collation_element can be a character, a collating symbol enclosed in angle brackets
representing a character or collating element, the special symbol
UNDEFINED or an
ellipsis (
...).
A character stands for itself; a collating symbol can be a symbolic name for a character
that is interpreted by the charmap file, a multi-character collating element defined by a
collating-element
keyword, or a collating symbol defined by the collating-
symbol keyword.
The special symbol
UNDEFINED specifies the collating position of any characters not
explicitly defined by collating element entries. For example, if some group of characters
is to be omitted from the collation sequence and just collate after all defined characters, a
collating symbol might be defined before the order_start keyword:
collating-symbol <HIGH>
Then somewhere in the list of collating element entries:
UNDEFINED <HIGH>
Notice that there is no second weight. This means that on a second pass all characters
collate by their encoded value.
An ellipsis is interpreted as a list of characters with an encoded value higher than that of
the character on the preceding line and lower than that on the following line. Because it
is tied to encoded value of characters, the ellipsis is inherently non-portable. If it is used,
a warning is issued and no output generated unless the
-c option was given.
The weight operands provide information about how the collating element is to be col-
lated on first and subsequent passes. Weight can be a two-character string, the special
symbol
IGNORE, or a collating element of any of the forms specified for collating_element
except
UNDEFINED. If there are no weights , the character is collating strictly by its posi-
tion in the list. If there is only one weight given, the character sorts by its relative posi-
tion in the list on the second collation pass.
An equivalence class is defined by a series of collating element entries all having the
same character or symbol in the first weight position. For example, in many locales all
forms of the character ’A collate equal on the first pass. This is represented in the collat-
ing element entries as:
’A’ ’A’;’A’ # first element of equivalence class
’a’ ’A’;’a’ # next element of class
Two-to-one collating elements are specified by collating-elements defined before the
order_start keyword. For example, the two-to-one collating element CH in Spanish,
would be defined before the order_start keyword as
collating element <CH> from "CH"
It would then be used in a collating element entry as <CH>.
A one-to-two collating element is defined by having a two-character string in one of the
weight positions. For example, if the character
’X’ collates equal to the pair "AE", the
collating element entry would be:
8 Hewlett-Packard Company 8 HP-UX 11i Version 3: September 2010