HP Software Reference Information Storage System v1.6 User Guide (T3559-90810, November 2008)

represents the maximum number of other document words that can occur between any t wo successive
words of the sequence. A document matches a proximity word sequence if all words occur in the
document in the same order, with at most N intervening words, where N is the proximity value.
For example, the sequence "bird garden stone"~3 matches any document that has these three
wordsinthisorder,withbird and garden separated by no more than three words, a nd garden and stone
separated by no more than three words. This sequence matches a document with the text abirdinthe
rose garden is near a stone because there are at most three words between successive sequence words.
This sequence also matches abirdgardenwithastonefor the same reason.
Simple word sequences are a special case of proximity word sequences: "..."isthesameas".
. ."~0.Anydocumentsfoundby". . ."~N are also found by ". . ."~M,whenM>N.
Matching wor
dsequencesinattachments
This section discusses word matching in attachments. Like other doc uments, RISS renders attachment
documents (like spreadsheets and PDF les) into text words. When RISS renders a document, it follows
the do cument applications internal representation of the le.
Certain le
types, for example spreadsheets, lo ok very different internally than they do externally. This
means that word sequence in the external application representation which the end user sees may
differ from the internal application representation. RISS query matching uses the internal application
representation. Below are a couple of examples to illustrate.
Example 1.
Separators are ignored
RISS renders text into words. Remaining characters such as periods, commas, spaces, and newlines are
considered separators and are ignored. Phrase queries ignore all formatting elements and non-word
characters. The following original plain text of:
“This was news to Mr. Smith.
Johnson, however, knew better.
matches the phrase query of:
“Smith Johnson”
This is because internally, the two plain text sentences a re represented as one long string of continuous
words: “This was news to Mr Smith Johnson however k new bet ter”.
Example 2 . Sequence is n ot intuitive
Internally in an attachment’s original applicati
on, a large multi-page document or a single page
spreadsheet equates to a long text sequence. Text may not appear in the same sequence internally as
it appears externally. Also, multiple instances of the same text in c ertain le types are represented
as a single instance.
Excel spreadsheets
Look at the external representation of the following Excel spreadsheet.
Table 10 Excel spreadsheet
United States Presidents named John
John Adams
1797-1801
John Quincy Adams
1825-1829
John Fitzgerald Kennedy
1961-1963
John Tyler
1841-1845
The specic order in which the text in the cells is stored internally depends on:
The version of Excel used to generate the spreadsheet
The insertion order for the spreadsheet text
User Guide
41