HP StorageWorks Reference Information Storage System V1.5 User Guide (T3559-96045, November 2008)

A ? ma tches any single character in a document word. For example, b??t matches beat, beet,
boat, blot, best, bust, bout,andsoon.
An * matches any sequence of characters in a document word, including a sequence of no
characters. For example, f*t matches the document words foot, feet, t , fault,andft;andf*
matches any document word beginning with f.
You can use any number of wildcard characters (* or ?) in a query word, but you cannot use a wildcard
at the beginning of a query word. An error message results. For example, *ion is not a valid query.
Matching similar words
Topics include:
Fuzzy words,page50
Measuring wo
rd similarity, page 50
Fuzzy words
You can search for document words that are textua lly sim i lar to a given literal query word (that is, one
containing no wildcards). To do this, append a tilde (~) character to the word, creating a fuzzy word.
For example, the fuzzy word define~ matches the similar words dened and denite,butdoesnot
match dening, denition, indenite,orpine.Italsomatchesdene itself.
Measurin
g word similarity
Theeditd
istance (also called Levenshtein distance) between t wo words is the number of single-character
operations (deletion, replacement, or insertion) require d to change o ne word into the other word.
For example, the edit distance between dene and pine is three: t wo deletions (d and e)andone
replacement (f by p) . The distance b et ween dene and denite is also three (e replaced by i; te inserted).
The searc
hengineconsidersdene more similar to denite than to pine,eventhoughtheeditdistances
are the s
ame (three), because the edit distance (number of character changes) is compared to the word
length (
of the shor ter of the query and document words). Two words a re closer, for querying pu rposes, if
it takes less to change one word into the other word relative to their lengths.
The similarity ratio used by the search engine is d/min(query, doc), where d is the edit distance, min is a
function that returns the lesser of its arguments, and query and doc are the lengths of the quer y word a nd
document word, respectively. A fuzzy word matches a document word if this ratio is no more than 0.5.
Examples:
Words Com pared Similarity Ratio Match ?
dene,
denite
3/min(
6,8)=3/6=0.5
yes
dene,pine 3/min(6,4)=3/4=0.75
no(0.75>0.5)
Matching word sequences
You can use word sequences to nd documents with words that occur in a specied order and are
separated by a specied maximum distance.
Topics include:
•Simplewordsequences, page 51
Proximity word sequences, page 51
Matching word sequences in attachments,page51
50
Query expression syntax and matching