1.4

The Input Data Section
For a CSV File
l
Field separator: Defines what character separates each fields in the file.
l
Text delimiter: Defines what character surrounds text fields in the file, preventing the
Field separator from being interpreted within those text delimiters.
l
Comment delimiter: Defines what character starts a comment line.
l
Encoding: Defines what encoding is used to read the Data Source (US-ASCII, ISO-
8859-1, UTF-8, UTF-16, UTF-16BE or UTF-16LE ).
l
Lines to skip: Defines a number of lines in the CSV that will be skipped and not be used
as Source Records.
l
Set tabs as a field separator: Overwrites the Field separator option and sets the Tab
character instead for tab-delimited files.
l
First row contains field names: Uses the first line of the CSV as headers, which
automatically names all extracted fields.
l
Ignore unparseable lines: Ignores any line that does not correspond to the settings
above.
For a PDF File
Note
PDF Files have a natural, static delimiter in the form of Pages, so the options here are interpretation
settings for text in the PDF file. Each value represents a fraction of the average font size of text in a
data selection, meaning ".3" represents 30% of the height or width.
l
Word spacing: Determines the spacing between words. As PDF text spacing is
somehow done via positioning instead of actual text spaces, position of text is what is
used to find new words. This option determines what percentage of the average width of a
single character needs to be empty to consider a new word has started. Default value is .3
, meaning a space is assumed if a blank area of 30% of width of the average character in
the font.
l
Line spacing: Determines the spacing between lines of text. The default value is 1,
meaning the space between lines must be equal to at least the height of the average
character height.
Page 115