2018.1

Table Of Contents
Tip
This how-to describes in detail how to extract an item description that appears in a variable number
of lines: How to extract multiline items.
Extracting data of variable length
In PDF and Text files, transactional data isn't structured uniformly, as in a CSV, database or
XML file. Data can be located anywhere on a page. Therefore, data are extracted from a
certain region on the page. However, the data can be spread over multiple lines and multiple
pages:
l Line items may continue on the next page, separated from the line items on the first page
by a line break, a number of empty lines and a letterhead.
l Data may vary in length: a product description for example may or may not fit on one line.
How to exclude lines from an extraction is explained in another topic: "Extracting transactional
data" on page175 (see From a PDF or Text file).
This topic explains a few ways to extract a variable number of lines.
Page 186