HP StorageWorks Reference Information Storage System V1.4 User Guide (T3559-96028, December 2005)

A ? matches any single character in a document word. For example, b??t matches beat, beet,
boat, blot, best, bust, bout,andsoon.
An * matches any sequence of characters in a document word, including a sequence of no
characters. For example, f*t matches the document words foot, feet, t, fault,andft;andf*
matches any document word beginning with f.
You can use any number of wildcard characters (* or ?) in a query word, but you cannot use a wildcard
at the beginning of a query word. An error message results. For example, *ion is not a valid query.
Matching similar words
Topics include:
Fuzzy words,
page 74
Measuring word similarity, page 74
Fuzzy words
You can search for document words tha t are textually similar to a given literal quer y word (that is, one
containing no wildcards). To do this, append a tilde ( ~ ) character to the word, creating a fuzzy word.
For example, the fuzzy word define~ matches the similar words dened and denite,butdoesnot
match dening, denition, indenite,orpine.Italsomatchesdene itself.
Measuring word similarity
The edit distance (also called Levenshtein distance) between two words is the number of single-character
operations (deletion, replacement, or insertion) required to change one word into the other word.
For example, the edit distance between dene and pine is three: two deletions (d and e)andone
replace
ment (f by p). The distance between dene and denite is also three (e replaced by i; te inserted) .
Thesearchengineconsidersdene more similar to denite than to pine,eventhoughtheeditdistances
are the same (three) , because the edit distance (number of character changes) is compared to the word
length (of the shorter of the query and document words). Two words are closer, for querying purposes, if
it takes less to change one word into the other word relative to their lengths.
The similarity ratio used by the search engine is d/min(query, doc), where d is the edit distance, min is a
function that returns the lesser of its arguments, and query and doc are the lengths of the query word and
document word, respectively . A fuzzy word matches a document word if this ratio is no more than 0.5.
Examp
les:
Words Compared Similarity Ratio Match ?
dene, denite 3/min(6, 8) = 3/6 = 0.5
yes
dene
,pine
3/min
(6,4)=3/4=0.75
no(0.75>0.5)
Matching word sequences
You can use word sequences to nd documents with words that occur in a specied order and are
separated by a specied maximum distance.
Topics include:
•Simplewordsequences, page 75
Proximity word sequences,page75
74
Query expression syntax a nd matching