HP Software Reference Information Storage System v1.6 User Guide (T3559-90810, November 2008)

For the spreadsheet above, assuming the cell text for names were entered in d isplayed order from top letf
to bottom right (John Adams was entered rst) and the title and da tes were entered after all the names
were entered, most versions of Excel store the text internally as follows:
John
Adams
John Qu incy
John Fitzgerald
Kennedy
Tyler
United States Presidents named John
1797–1801
1825–1829
1961–1963
1841–1845
Note the following features of the internal representation:
Text sharing: Where certain text appears in more than one cell in the spreadsheet, the text may
appear only once in the internal representation. In this exa mple, this is the case with the text “John”
and Adams”. (Note that no t all versions of Excel consistently share text in exactly this way.)
This text sharing only occurs at the level of the entire text of a cell, and never occurs within cells. Thus,
“John Quincy” and “John Fitzgerald” remain whole and independent.
Even accounting for text sharing, the specic ordering of various cell text in the internal representation
does not necessarily follow presentation order, and instead often follows insertion order.
Because of these factors, text sequence matches in an Excel spreadsheet are only consistent with the
spreadsheet as viewed in Excel if the ma tched text appears wholly within a cell. However, it is p ossible
for sequences to match in inconsistent ways across cell text dep ending on the precise version and editing
history of that spreadsheet.
For the spreadsheet and order of insertion shown above, the following queries would match:
"John Adams"
"Adams John"
"Quincy John"
"John Fitzgerald Kennedy"
"Presidents named John"
And, the following queries would not m atch:
"John Tyler"
"Quincy Adams"
"John Quincy Adams"
"John Adams 1797–1801"
PDF documents
PDF documents are another case where the internal text representation can vary widely from the visible
presentation in PDF readers. Some issues that can arise:
Text sequences can appear out of order on the same page depending on how the pa ge was
composed.
Text can appear doubled or can have spacing inserted into or removed from the internal
representation to assist some specicvisualpresentation.
In general, PDF documents generated via print drivers are far more susceptible to these issues than PDF
documents generated directly using Acro bat and other such composing tools. H owever, bec ause of the
nature of PDF itself, even they are not immune.
42
Query expression syntax and matching