HP IAP Version 2.1 User Guide, March 2011

Proximity word sequences
You can use simple word sequences to search for words separated by separators but not by other
words. To search for document words that are in an ordered sequence, but might be separated by
other words, use a proximity word sequence.
To write a proximity word sequence, use the same syntax as a simple word sequence, but append a
tilde (~) character to the second quote, and follow that with a numeric proximity value. The proximity
value represents the maximum number of other document words that can occur between any two
successive words of the sequence. A document matches a proximity word sequence if all words occur
in the document in the same order, with at most N intervening words, where N is the proximity value.
For example, the sequence "bird garden stone"~3 matches any document that has these three
words in this order, with bird and garden separated by no more than three words, and garden and
stone separated by no more than three words. This sequence matches a document with the text a bird
in the rose garden is near a stone because there are at most three words between successive sequence
words. This sequence also matches a bird garden with a stone for the same reason.
Simple word sequences are a special case of proximity word sequences: ". . ." is the same as
". . ."~0. Any documents found by ". . ."~N are also found by ". . ."~M, when M > N.
Matching word sequences in files and email attachments
IAP renders files and email attachments (like spreadsheets and PDF files) into text words. When the
IAP renders a document, it follows the document application's internal representation of the file.
Certain file types, for example spreadsheets, look very different internally than they do externally.
This means that word sequence in the external application representation which the end user sees
may differ from the internal application representation. IAP query matching uses the internal application
representation.
Separators are ignored
IAP renders text into words. Remaining characters such as periods, commas, spaces, and newlines
are considered separators and are ignored. Phrase queries ignore all formatting elements and non-word
characters. The following original plain text of:
This was news to Mr. Smith. Johnson, however, knew better.
matches the phrase query of:
Smith Johnson
This is because internally, the two plain text sentences are represented as one long string of continuous
words: This was news to Mr Smith Johnson however knew better.
Sequence is not intuitive
Internally in the file's original application, a large multi-page document or a single page spreadsheet
equates to a long text sequence. Text may not appear in the same sequence internally as it appears
externally. Also, multiple instances of the same text in certain file types are represented as a single
instance.
Query expression syntax and matching56