1.4

Table Of Contents

The Input Data Section

For a CSV File

Field separator: Defines what character separates each fields in the file.

Text delimiter: Defines what character surrounds text fields in the file, preventing the

Field separator from being interpreted within those text delimiters.

Comment delimiter: Defines what character starts a comment line.

Encoding: Defines what encoding is used to read the Data Source (US-ASCII, ISO-

8859-1, UTF-8, UTF-16, UTF-16BE or UTF-16LE ).

Lines to skip: Defines a number of lines in the CSV that will be skipped and not be used

as Source Records.

Set tabs as a field separator: Overwrites the Field separator option and sets the Tab

character instead for tab-delimited files.

First row contains field names: Uses the first line of the CSV as headers, which

automatically names all extracted fields.

Ignore unparseable lines: Ignores any line that does not correspond to the settings

above.

For a PDF File

Note

PDF Files have a natural, static delimiter in the form of Pages, so the options here are interpretation

settings for text in the PDF file. Each value represents a fraction of the average font size of text in a

data selection, meaning ".3" represents 30% of the height or width.

Word spacing: Determines the spacing between words. As PDF text spacing is

somehow done via positioning instead of actual text spaces, position of text is what is

used to find new words. This option determines what percentage of the average width of a

single character needs to be empty to consider a new word has started. Default value is .3

, meaning a space is assumed if a blank area of 30% of width of the average character in

the font.

Line spacing: Determines the spacing between lines of text. The default value is 1,

meaning the space between lines must be equal to at least the height of the average

character height.

Page 115