HP Reference Information Storage System Version 1.6 User Guide revision 2 (T3559-90810, August 2007)

Fuzzy words
You can search for document words tha t are textually sim i lar to a given literal query word (that is, one
containing no wildcards). To do this, append a tilde (~) character to the word, creating a fuzzy word.
For example, the fuzzy word define~ matches the similar words dened and denite,butdoesnot
match dening
, denition, indenite,orpine.Italsomatchesdene itself.
Measuring word similarity
The edit distance (also called Levenshtein distance) between two words is the number of single-character
operations (d eletion, replacement, or insertion) required to change one word into the other word.
For example, the edit distance between dene and pine is three: two deletions (d and e)andone
replacement (f by p). The distance bet ween dene and denite is also three (e replaced by i; te inser ted).
Thesearchengineconsidersdene more similar to denite than to pine,eventhoughtheeditdistances
are the same (three), because the edit distance (number of character changes) is compared to the word
length (of the shor ter of the query and document words). Two words are closer, for querying purposes, if
it takes less to change one word into the other word relative to their leng ths.
The similarity ratio used by the search engine is d/min(quer y, doc), where d is the edit distance, min is a
function that returns the lesser of its arguments, and query and doc are the lengths of the query word and
document word, respectively . A fuzzy word matches a document word if this ratio is no more than 0.5.
Examples:
Words Compared Similarity Ratio Match ?
dene, denite 3/min(6, 8) = 3/6 = 0.5
yes
dene,pine 3/min(6,4)=3/4=0.75
no(0.75>0.5)
Matching word sequences
You can use word sequences to nd documents with words that occur in a specied order and are
separated by a specied maximum distance.
Topics in
clude:
Simple word sequences, page 36
Proximity word sequences,page36
Simple word sequences
To search for an ordered sequence of words, use a simple word sequence, which is a list of literal
query words (no wildcards) separated by spaces (or other separators) and enclosed in quotes ("). A
document matches a simple word sequence if all words occur in the document in the same order, with
no inter vening words.
For example, the sequence "like a rolling stone" does not match a document with the text like a
large rolling stone because of the intervening word large.
Proximity word sequences
You c an use simple word sequences to search for words separated by separators but not by other
words. To search for document words that are in an ordered sequence, but might be separated by other
words
,useaproximitywordsequence.
To write a proximity word sequence, use the same syntax as a simple word sequence, but append a tilde
(~) character to the second quote, and follow that with a numeric proximit y value. The proximity value
repre
sents the maximum number of other document words that can occur b etween any two successive
36
Query expression syntax and matching