1.8

Table Of Contents
Settings pane
Settings for the data source and a list of Data Samples and JavaScript files used in the current
data mapping configuration, can be found on the Settings tab at the left. The available options
depend on the type of data sample that is loaded.
The Input Data settings (especially Delimiters) and Boundaries are essential to obtain the data
and eventually, the output that you need. For more explanation, see "Data source settings" on
page๎˜ƒ115.
Input Data
The Input Data settings specify how the input data must be interpreted. These settings are
different for each data type. For a CSV file, for example, it is important to specify the delimiter
that separates data fields. PDF files are already delimited naturally by pages, so the input data
settings for PDF files are interpretation settings for text in the file.
CSV file Input Data settings
In a CSV file, data is read line by line, where each line can contain multiple fields. The input
data settings specify to the DataMapper module how the fields are separated.
l
Field separator: Defines what character separates each field in the file. Even though
CSV stands for comma-separated values, CSV can actually refer to files where fields are
separated using any character, including commas, tabs, semicolons, and pipes.
l
Text delimiter: Defines what character surrounds text in the file, preventing the Field
separator from being interpreted within those text delimiters. This ensures that, for
example, the field โ€œSmith; Johnโ€ is not interpreted as two fields, even if the field delimiter
is the semicolon.
l
Comment delimiter: Defines what character starts a comment line.
l
Encoding: Defines what encoding is used to read the Data Source (๎˜ƒUS-ASCII, ISO-
8859-1, UTF-8, UTF-16, UTF-16BE or UTF-16LE ).
l
Lines to skip: Defines a number of lines in the CSV that will be skipped and not used as
records.
l
Set tabs as a field separator: Overwrites the Field separator option and sets the Tab
character instead for tab-delimited files.
l
First row contains field names: Uses the first line of the CSV as headers, which
automatically names all extracted fields.
Page 203