HP StorageWorks Reference Information Storage System V1.1 User Guide (February 2005)

Query expression syntax and matching Chapter 6:

Query syntax and matching

HP StorageWorks Reference Information Storage System User Guide, February 2005 6-5

For the ISO 8859–1 (Latin–1) encoding, used for Western European

languages, accented letters are included. Most ideographic characters, such

as used in Asian languages, are also considered letters.

Whatever the language and encoding used for a particular document (file or

email message), the system maps encoded characters to the Unicode 2.0

standard. The Unicode 2.0 standard is then used to determine if a given

character is a letter or a digit (or neither):

•

letter

is any Unicode character in one of these Unicode categories:

Ll (lowercase letter), Lu (uppercase letter), Lt (titlecase letter),

Lm (modifier letter), or Lo (other letter).

•

digit

is any Unicode character whose Unicode name contains the word

DIGIT

, provided it is not in the range

\u2000

(en quad = en space) through

\u2FFF

(ideographic description – future).

Letters and digits in files

Although all letters and digits are word characters, their treatment in files

(including email message attachments) depends on the character encoding

used. You can search for any words in email message bodies and headers,

regardless of the encoding.

You can search for words in files (including email body, header, attachments,

and indexed documents) provided the character encoding is one the following:

• ISO-LATIN-1 • ISO-8859-1

•US-ASCII •UTF-8

• ISO-2022-JP • BIG5

• ISO-8859-2 • koi8-r

• EUC-KR • WINDOWS-1252

• WINDOWS-1253 • WINDOWS-1254

• WINDOWS-1255 • WINDOWS-1256

• gb2312 • WINDOWS-1258

• KS_C_5601-1987 • EUC-JP