HP IAP Version 2.1 User Guide, March 2011

word. For example, the fuzzy word define~ matches the similar words defined and definite, but
does not match defining, definition, indefinite, or pine. It also matches define itself.
NOTE:
Do not use wildcards in fuzzy searches. For example, foo*~ or foo?~ is not a valid query
Measuring word similarity
The edit distance (also called Levenshtein distance) between two words is the number of single-character
operations (deletion, replacement, or insertion) required to change one word into the other word.
For example, the edit distance between define and pine is three: two deletions (d and e) and one
replacement (f by p). The distance between define and definite is also three (e replaced by i; te
inserted).
The search engine considers define more similar to definite than to pine, even though the edit distances
are the same (three), because the edit distance (number of character changes) is compared to the
word length (of the shorter of the query and document words). Two words are closer, for querying
purposes, if it takes less to change one word into the other word relative to their lengths.
The similarity ratio used by the search engine is d/min(query, doc), where d is the edit distance, min
is a function that returns the lesser of its arguments, and query and doc are the lengths of the query
word and document word, respectively. A fuzzy word matches a document word if this ratio is no
more than 0.5.
Examples:
Match?Similarity ratioWords compared
yes3/min(6, 8) = 3/6 = 0.5define, definite
no (0.75 > 0.5)3/min(6, 4) = 3/4 = 0.75define, pine
Matching word sequences
You can use word sequences to find documents with words that occur in a specified order and are
separated by a specified maximum distance.
Topics include:
Simple word sequences, page 55
Proximity word sequences, page 56
Simple word sequences
To search for an ordered sequence of words, use a simple word sequence, which is a list of literal
query words (no wildcards) separated by spaces (or other separators) and enclosed in quotes ("). A
document matches a simple word sequence if all words occur in the document in the same order,
with no intervening words.
For example, the sequence "like a rolling stone" does not match a document with the text
like a large rolling stone because of the intervening word large.
HP IAP 2.1 User Guide 55