HP StorageWorks Reference Information Storage System V1.0 User Guide (May 2004)

LO
Chapter 5:
Query Syntax and Matching
Query Expression Syntax and Matching
5-8 HP StorageWorks Reference Information Storage System User Guide, April 2004
You can use any number of wildcard characters (
*
or
?
) in a query word, but
you cannot use a wildcard at the beginning of a query word. (An error
message will result.) For example,
*ion
is not a valid query.
See Also
Query Expression Examples
, on page 5-13, for examples of query expres-
sions with literal words and words with wildcards.
Matching Similar Words
Fuzzy Words
You can search for document words that are textually similar to a given literal
query word (that is, one that contains no wildcards). You do this by
appending a tilde (
~
) character to the word, creating a
fuzzy word
. For
example, the fuzzy word
define~
will match the similar words
defined
and
definite
(but not
defining
,
definition
,
indefinite,
or
pine
). It will also match
define
itself.
See Also
Query Expression Examples
, on page 5-13, for examples of query expres-
sions with fuzzy words.
How Word Similarity Is Measured
Note:
This section provides an in-depth explanation of how word simi-
larity is measured. In most cases, you do not need to be
concerned with just how similar two words must be in order to
match. However, when interpreting the results of complex
queries, this information can help you better understand why
you obtain the results you do.
The edit distance (also called Levenshtein distance) between two words is the
number of single-character operations needed to change one into the other,
where an operation is a deletion, replacement, or insertion.
For example, the edit distance between
define
and
pine
is three: two deletions
(
de
) and one replacement (
f
by
p
). The distance between
define
and
definite
is
also three (
e
replaced by
i
;
te
inserted).