HP StorageWorks Reference Information Storage System V1.5 User Guide (T3559-96045, November 2008)

Simple word sequ
ences
To search for an ordered sequence of words, use a simple word sequence, which is a list of literal
query words (no
wildcards) separated by spaces (or other separators) and enclosed in quotes ("). A
document matches a simple word sequence if all words occur in the document in the same order, with
no intervening words.
For example, th
esequence"like a rolling stone" does not match a document with the text like a
large rolling stone because of the inter vening word large.
Proximity word sequences
You can use simple word sequences to search for word s separa ted by separators but not by other
words. To search for document words that are in an ordered sequence, but might be separated by other
words, use a proximity word sequence.
To write a proximity word sequence, use the same syntax as a simple word sequence, but append a tilde
(~) character to the second quote, and follow that with a numeric p roximity value. The proximity value
represents the maximum numb er of other document words that can occur between any two successive
words of the sequence. A document matches a proximity word sequence if all words o ccur in the
document in the same order, with at m ost N inter vening words, where N is the proximity value.
For example, the sequence "bird garden stone"~3 matches any document that has these three
wordsinthisorder,withbird and garden separated by no more than three words, and garden and stone
separated by no more than three words. This sequence ma tches a document with the text abirdinthe
rose garden is near a stone because there are at most three words between successive sequence words.
This sequence also matches abirdgardenwithastonefor the same reason.
Simple word sequences are a special case of proximity word sequences: "..."isthesameas".
. ."~0.Anydocumentsfoundby". . ."~N are also found by ". . ."~M,whenM>N.
Matching w
ord sequences in attachments
This section discusses word matching in attachments. Like other documents, RISS renders attachment
documents (like spreadsheets and PDF les) into text words. When RISS renders a document, it follows
the document application’s internal representation of the le.
Certain
le types, for example spreadsheets, look very different internally than they do externally. This
means that word sequence in the external application representation which the end user sees may
differ from the internal application representation. RISS quer y matching uses the internal application
representation. Below are a couple of examples to illustrate.
Example
1. Separators are ignored
RISS renders text into words. Remaining characters such as periods, commas, spaces, and newlines are
considered separators and are ignored. Phrase queries ignore all formatting elements and non-word
characters. The following original plain text of:
“This was news to Mr. Smith.
Johnson, however, knew better.
matches the phrase query of:
“Smith Johnson
This is because internally, the two plain text sentences are represented as one long string of continuous
words: “This was news to Mr Smith Johnson however knew better”.
Example 2. Sequ ence is not intuitive
Internally in an attachment’s original application, a large multi-page document or a single page
spreadsheet equates to a long text sequence. Text may not appear in the same sequence internally as
it appears externally. Also, multiple instances of the sam e text in certain le types are represented
as a single instance.
Version 1.5 51