ABBYY FlexiCapture 10 Project Setup Guide © 2011 ABBYY. All rights reserved.
Dear user! This guide describes the actions performed during project setup. If you still need more information, try the following: • Use other help files. You can open them using the program menu or by clicking “F1” or Start > Programs > ABBYY FlexiCapture 10 Stations> Helps. System Administrator’s Guide: Start > Programs > ABBYY FlexiCapture 10 Servers > Guides > System Administrator’s Guide. • Project Setup Guide: Start > Programs > ABBYY FlexiCapture 10 Stations > Guides > Project Setup Guide.
Contents 1. Introduction....................................................................................................................................... 4 1.1. The Purpose of Data Capture ..................................................................................................... 4 1.2. Data Capture Automation .......................................................................................................... 4 1.3. Documents You Can Process in ABBYY FlexiCapture 10...................
5.6. Workflow ................................................................................................................................ 37 5.6.1. Standalone ....................................................................................................................... 37 5.6.2. Distributed ....................................................................................................................... 37 5.7. .Net Assemblies ..............................................................
1.3. Documents You Can Process in ABBYY FlexiCapture 10 ABBYY FlexiCapture 10 is a data capture application supporting different document types. The following document types can be processed in ABBYY FlexiCapture 10. • Structured documents. Documents with dedicated data fields that remain constant in quantity, position and formatting throughout the document copies are called structured. These forms are often issued in printed form for filling by hand.
Work Batches are used for document processing. Test Batches are used during Document Definition debugging. The difference between these batch types is that Test Batches make use of the local (unpublished) Document Definition, while Work Batches use the published Document Definition. You can access the list of Test Batches right from the Document Definition Editor window. A Document consists of an image of one or several pages (i.e. single-page and multi-page documents) and the data captured from them.
To create a new definition, click Projects > Document Definitions... and, in the dialog box that opens, click New…. The Document Definition Creation Wizard will open. In the Create New Document Definition window, you can specify the main properties of the definition: the name, comment, language and style. Then set the text type: select ICR (for hand-printed text) or OCR (for machine-printed text) from the dropdown list.
- text entry field - anchor - checkmark - separator - checkmark group - static text - barcode - barcode - picture - table - group (of fields) A barcodes can be either a recognizable field or a static element. Care must be taken when selecting the mode that depends on the barcode purpose – if information is to be captured from the barcode, mark its region in Field Regions Mode, and if it is used for definition identification and matching, mark it in Static Element Mode.
3.1.1.2. Fields with no marking There can be fields having no corresponding region on the image. Fields with no marking are marked with a red asterisk in the list. Such fields can be used to store calculation results for values in recognized fields. Fields with no marking have all the properties characteristic of their type: they can be sent to the Operator for Verification, the format check can be executed, and the field values can be exported. To create a field with no marking, do one of the following: 1.
To create a field with multiple regions, create one of the field regions, select it, then right-click it, and, on the shortcut menu, click Continue Region… Then, select a place for the region to continue at. Repeat the procedure the required number of times. 3.1.1.5.
Figure 2. Excluding an unrecognizable region 3.1.1.7. Deleting fields To delete a field, select it and press Delete or click Delete on the shortcut menu. To delete marking only, leaving the field in the document structure, press Shift+Delete or click Delete Region on the shortcut menu of the field. 3.1.2. Static Elements Static elements are objects that do not provide information for capture. They are used for Document Definition matching and identification. Anchors are a type of static elements.
If a barcode is used for data capture, it is a field. Create it when you are working with fields. The Properties dialog box of such barcode has all the tabs of the Field Properties dialog box: General, Data Type, Recognition, Verification, and Rules. The value of such barcode will be recognized and, if the settings provide for it, sent to verification and export. 3.1.3.
Figure 3. The Field Properties dialog box, General tab 3.1.3.2. Data Type Data Type defines the set of possible field values and the allowed field format. If the value entered in the field doesn’t correspond to the specified data type, the operator will receive a verification error message. The text data type usually has a simple area of valid values: the date, time, address, taxpayer’s account number (INN), and sum.
Select a category from the Content list. In the Details field, the description of one of the data types (chosen by default, or specified manually earlier) that belong to the category will be displayed. If the Process value as text option is selected, the values of fields with any content will be processed and exported as text. The field format check will not be carried out either.
Figure 4. Field Properties dialog box, Data Type tab (text entry field) Automatic processing of the recognized value can be carried out for any data type. Unnecessary spaces will be deleted, capitalization and spelling corrected. To start automatic processing, click the Edit… button located to the right of the Replace characters field. In the dialog box that opens, specify the necessary text processing parameters (Figure 5). Figure 5.
period, for text fields, if the value is valid and if it has the required format, etc.). To specify value restrictions for a field, click the Edit… button located to the right of the Value Check field. (Figure 6). Figure 6. Value Check Settings dialog box 3.1.3.2.2. Data types for checkmarks For checkmarks, you can specify the values assigned to fields at checkmark selection/clearance. It is done on the Data Type tab (Figure 7).
Figure 7. Field Properties dialog box, Data Type tab (checkmark not in the group) 3.1.3.2.3. Data types for checkmark groups In the Checkmark Group Properties dialog box, on the Data Type tab, the list of names of checkmarks in the group will be displayed (Figure 8). Clear the Allow empty selection option if you want at least one checkmark to be in the group. If it is possible to select multiple checkmarks in the group, select Allow multiple selection.
Figure 8. Field Properties dialog box, Data Type tab (checkmark groups) 3.1.3.3. Field recognition properties ABBYY FlexiCapture 10 allows specifying recognition settings for each field. Field properties defined correctly on the Recognition tab of the Field Properties dialog box will increase recognition quality and minimize the possibility of error. The properties differ for various field types. The recognition properties by field are as follows. 3.1.3.3.1.
automatically). The Simple type is used for fields with no marking, usually for texts printed on typographic printers. You can select the Letter case for letters of a particular case to be found. If the field can be filled with both small and capital letters, choose Auto. Select either the horizontal or vertical text Orientation. For a one-line field, select One line. For a field that will always contain one word (i.e. without spaces), select One word. Specify the image preprocessing properties.
You can allow corrections for certain checkmarks – if a checkmark was put incorrectly, the person who did it can shade the whole checkmark. Completely shaded checkmarks will be deemed void. However, if you selected the Auto type, you won’t be able to allow corrections. Image preprocessing can be configured for checkmarks just like for text fields. If checkmarks are grouped, they will share the same properties. Recognition properties are defined likewise, but for the whole checkmark group. Figure 10.
3.1.3.4. Verification settings Verification is checking recognized data by an operator. During Document Definition creation, you can configure the verification settings on the Verification tab of the Field Properties dialog box (Figure 11). Uncertainly recognized characters will be highlighted by the program and sent to an Operator for checking.
3.1.3.5. Image Export Parameters In the Image Field Properties, you can specify export parameters, such as the exported file type, quality, color type and resolution. To do it, go to the Export tab of the Field Properties dialog box (on the field’s shortcut menu, click Properties…). You can configure the following: • • • File type (TIFF, JPEG, BMP, JPEG2000, PCX packbits, PNG). Quality. For TIFF, JPEG and JPEG2000 files, you can choose the exported file quality (best, high, normal, low).
be available from two Script rules at a time. It will be available from any number of rules only in read-only mode. Rules are specified on the Rules tab of the Field Properties dialog box (Figure 12). The rules can affect the values of one or multiple fields. Rule severity can be specified (choose either error or warning). The rule will be flagged red if an error occurs, and yellow, if a warning is issued. Figure 12. Field Properties dialog box, Rules tab 3.1.3.7.
Figure 13. Field Properties dialog box, Custom Action tab 3.1.4. Creating a Document Definition for Multi-Page Documents ABBYY FlexiCapture 10 allows creating multi-page Document Definition. A definition can consist of a random number of sections, each of them containing one or multiple pages. Section order, quantity and the document assembly rules are specified for multi-page Document Definitions.
to create a separate section for each page, and then set the document structure, i.e. section order and number of repetitions. A more complex case is a definition containing several sections that include more than one page each. For example, it can be a Document Definition consisting of non-flexible section and a multi-page flexible section, or a Document Definition describing documents consisting of a double-faced page that can repeat a certain number of times.
Figure 14. Document structure, Thumbnails view mode You can also specify the order and number of section repetitions by clicking Document Definition > Document Definition Properties… in the Document Definition Editor window. On the Assembly tab (Figure 15), specify the minimum and maximum number of section repetitions in the document (the default number is 1). If you want to check the key field values, select Check equality of key fields and then select a key field for each section.
Figure 15. Document Definition properties, Assembly tab 3.1.5. Creating a Document Definition with Annex Pages ABBYY FlexiCapture 10 allows creating Document Definitions for documents with annexes. Annex pages can accompany any document. Separate fields are not required to be searched on these pages, so you don’t need to match Document Definitions for them. However, they are taken into account during document assembly.
3.1.6. Export Settings To configure saving of data obtained during document processing, you need to configure export for each Document Definition. There are four export types: to a file of the specified format, to an ODBCcompatible database, to a MS SharePoint library, and custom export (using a script). Export is configured in the Export Settings dialog box (Document Definition > Export Settings). To add a new export destination, click Add….
Select Overwrite existing files if you want newer files to overwrite the existing ones with the same name during export. Next, specify the file naming options for the exported files. To do it, click File Naming Options… and select the necessary options. Click Next to proceed. Select the file type (CSV, DBF, TXT, XLS, or XML) and adjust additional properties, for example, you can change text encoding. Click Next to proceed.
Enter a destination name and click Finish. 3.1.6.3. Exporting to SharePoint ABBYY FlexiCapture 10 allows exporting document to the Microsoftтм SharePoint library. Each document will have matching columns with values from document fields. These values can be used for document search and indexing. Notes. 1. To configure export to SharePoint, you must have the Administrator’s rights. However, to carry out the export itself, the rights of a Contributor will be enough. 2.
If you need to change the initial image resolution in order to reduce file size, select Change resolution to and enter a resolution. If you select the PDF format and select the Create searchable PDF option, the document image will be searched in full-text mode, and the recognized text will be saved in the selected format.
If rule errors occur during testing, or if invalid field property values are found, edit the Document Definition to correct these errors. When all errors are corrected, you can publish the Document Definition and proceed to mass document input. 3.1.9. Editing and Publishing a Document Definition After creating a Document Definition and successfully testing it on several images, publish it to make it accessible for recognition of working batches.
Now you need to specify field properties just like for a structured document. Configure the recognition, verification, and export properties as well as rules. 3.2.1. Classifiers A Classifier is a special ABBYY FlexiLayout Studio project designed for preliminary identification of separate pages and for labeling them as a certain type (for example, by the FlexiLayout/ FlexiLayout variant used).
As a rule, key fields (for example, a contract number) are searched using a FlexiLayout created in ABBYY FlexiLayout Studio. If automatic key field search is impossible, the values can be entered by an Operator. To allow it, create a Document Definition with one field (or several, if necessary) and, in the Recognition Properties of the field(s), select Don’t recognize (Key from Image field – will be entered manually).
acceptable flexible layout. In this case, if a FlexiLayout is matched, the rest won’t be considered. This option may be useful when you have many different FlexiLayouts as it will decrease the overall recognition time. A Document Definition can be temporarily disabled. To do it, clear the Enabled option in the list of Document Definitions. To enable the Document Definition, select this option. 5.
5.2. Image Preprocessing The following document creation and image processing parameters can be configured for a batch type: • Create new document. A new document can be created automatically when Document Definition is matched, for every image file, or for image files separated by blank pages or pages with a barcode (of certain type or value). If the separator pages are not to be processed, they can be deleted by selecting Delete separator pages. • Image Processing.
documents (i.e. those for which no matching Document Definitions have been found), select the Export unrecognized documents option and specify the export parameters. You can also set up the program to delete documents upon export. Simply specify how long documents should be stored after export. 5.6. Workflow 5.6.1. Standalone The following workflow parameters can be configured for the Standalone version: • Automatic batch export, if there are only few uncertainly recognized characters.
6. Configuring Image Import In the Distributed version, images are imported in a special application – Scanning Station. However, Verification Operators and Senior Verification Operators can also add images to batches. Images can also be automatically imported from Hot Folders. Import profiles allow specifying the import and image processing parameters.
Figure 16. Image Import Profiles dialog box Automatic Hot Folder checking is initiated as follows: • In Standalone version – by selecting Automatically check Hot Folders in the Image Import Profiles window. • In Distributed version – for individual projects, using the Processing Server Monitor, in the Hot Folders section. 7. Uploading a Project to the Server In the Standalone version of the system, Operators can start working with a project as soon as it is configured.
Load Images… Ctrl+O Scan Images… Ctrl+K Import Images Ctrl+I Export Ctrl+U Export Data to Files… Alt+Shift+S Export to Database… Alt+Shift+D Undo Ctrl+Z Redo Ctrl+Y Cut Ctrl+X Copy Ctrl+C Paste Ctrl+V Delete Del Select All Ctrl+A Find… Ctrl+F Find Next F3 Go to Next Document Ctrl+D Go to Previous Document Ctrl+Shift+D Despeckle Image Ctrl+Alt+K Invert Image Ctrl+Alt+V Rotate Image 90º clockwise Ctrl+W Rotate Image 90º counterclockwise Ctrl+Shift+W Test Batches Ctrl
Analyze Ctrl+E Match Document Definition… Alt+Shift+E Recognize Ctrl+R Run Verification F7 Next Item to Verify F4 Previous Item to Verify Shift+F4 Next Assembly Error F9 Previous Assembly Error Shift+F9 Next Uncertain Character F8 Previous Uncertain Character Shift+F8 Next Rule Error F6 Previous Rule Error Shift+F6 Document Definitions… Ctrl+T Batch Types… Ctrl+Shift+T Image Import Profiles Ctrl+Shift+I Update to Latest Version Alt+Shift+U Re-analyze Ctrl+Alt+E Re-recognize
Delete Del Delete Region Shift+Del Select All Ctrl+A Select by Type Ctrl+Shift+A Group Ctrl+G Ungroup Ctrl+Shift+G Copy Text from Image Ctrl+Alt+C Create Field: Text Alt+Shift+T Create Field: Checkmark Alt+Shift+C Create Field: Checkmark Group Alt+Shift+M Create Field: Barcode Alt+Shift+B Create Field: Picture Alt+Shift+P Create Field: Table Alt+Shift+L Create Field: Group Alt+Shift+G Despeckle Image Ctrl+Alt+K Invert Image Ctrl+Alt+V Rotate Image 90º Clockwise Ctrl+W Rotat
8.3. Group Verification Window Confirm All Enter Postpone All Ctrl+Enter Toggle Space Next Page Page Down Previous Page Page Up Undo Ctrl+Z Redo Ctrl+Y Select All Ctrl+A Show Character Image F2 Full Screen F11 Show Field Image Ctrl+I Show Field Image: On Top Alt+1 Show Field Image: On Bottom Alt+2 Image Scale: Zoom In Ctrl+Num+ Image Scale: Zoom Out Ctrl+Num– Help Topics F1 Exit Alt+F4 8.4.
Delete All Alt+Del Select All Ctrl+A Insert Line Break Shift+Enter Merge Characters Ctrl+M Analogous Fields Alt+F3 Show Character Image F2 Full Screen F11 Recognized Text Alt+F1 Character Image Cutting Alt+F2 Show Field Image Ctrl+I Show Field Image: On Top Alt+1 Show Field Image: On Bottom Alt+2 Image Scale: Zoom In Ctrl+Num+ Image Scale: Zoom Out Ctrl+Num– Help Topics F1 Exit Alt+F4 © 2011 ABBYY. All rights reserved.