ABBYY FlexiCapture 8.0 Professional User’s Guide © 2009 ABBYY. All rights reserved.
Dear User, This guide explains the core principles behind ABBYY FlexiCapture 8.0 Professional. Please read this guide carefully before using the program. For more information, please consult the following documentation: • Online help can be accessed from the menu or by pressing F1. The help file can also be accessed from Start > Programs > ABBYY FlexiCapture 8.0 Professional > Helps. • The ABBYY FlexiCapture 8.
Table of Contents 1. 2. 3. 4. 5. 6. 7. 8. Introduction ........................................................................................................................................................................4 1.1. The aim of document and data capture .....................................................................................................................4 1.2. Automated document and data capture ..................................................................................
1. Introduction 1.1. The aim of document and data capture All sorts of different documents are now used everywhere, by businesses, industries, and services alike. Applications, surveys, and invoices are an important part of the work of every enterprise or institution. The current standard of information technology makes it impossible to operate with paper documents only: most data must be converted into electronic format for storage, analysis, and further processing.
• training courses are required to work with the program. Users can master the program easily with the help of our technical documentation and reference materials. Scalability. Unlike a manual input system, ABBYY FlexiCapture 8.0 Professional is easily scalable. To increase output, you need only install the system at an additional workplace. 2. Administrator and Operator Functions To capture data using ABBYY FlexiCapture 8.0 Professional, you need to set up the system to process a certain document type.
Documents of various types can be processed in a single batch, and you can set up the system for processing documents of a mixed type. A document type only affects the method by which the document template is created. The type of processed documents does not affect the work of an operator. Let us analyze the various document types that can be processed using ABBYY FlexiCapture 8.0 Professional. • Structured documents.
4. Setting Up the System for Capturing Fixed Forms ABBYY FlexiCapture 8.0 Professional allows you to capture and process fixed forms quickly and efficiently.
4.1.1. Form elements Let us analyze the main form elements. (Figure 1) © 2009 ABBYY. All rights reserved.
Figure 1. Sample of a machine-readable form containing the main elements • Data fields. All forms designed for information gathering contain data fields. These fields are usually accompanied by an explanatory text. Data fields can be of the following type: Text fields used to enter text information. Such fields are groups of character cells for entry of characters. The design of a text field prompts the person filling out the form to use separate characters.
background color and saturation so that the program can easily remove the background during scanning. Ideally, only anchors and filled-out data fields are retained after scanning and despeckling: all other elements must be removed. For this type of processing, you must use a monochrome scanner with a color lamp (red or green), or a color scanner that has a setting to allow the background color to be removed. • Raster forms. To draw character cells, raster forms use dots spaced an equal distance apart.
4.2. Creating a project A project contains all of the necessary document capture settings: document templates, image import profiles, program settings, and processed documents. Documents are grouped into batches. The number of batches depends on your processing approach: you can process all documents in one batch, or you can sort documents into batches according to their date of import or scanning date. Documents are processed in work batches. Only work batches are accessible in operator’s mode.
4.3. Creating a document template The most important step in setting up a project is the creation of a template. The quality of data received after forms have been processed depends on the correctness of the template. To create a template, you must specify the: • Static elements on the image: anchors, separators, static text, and barcodes. Select which of these elements are to be used for template matching and document identification. Anchors are detected and marked automatically.
1. In the program main window, select Project > Document Templates… Click New… in the Document Templates dialog box. 2. In the dialog box that opens, specify the parameters for the template, its name and description. In the Language(locale) field, specify the language in which the form will be filled out. In the Writing style field, select the relevant country. This is because the shape of certain characters, for example, digits, may differ between countries. Select the text type: ICR (hand-printed).
4.3.2. Using elements to mark objects on the form Once the Template Creating Wizard has finished, the loaded image will be displayed in the Template Editor window. Anchors and data fields of the types you selected during the previous step of template creation will already be marked. You can automatically mark objects later on by selecting the tool and clicking on the area of the element to be marked. The program will automatically detect the type and location of the element.
You can copy elements (even to other document sections), delete, move elements, or change their sizes. If you copy fields, numbers are automatically added to the names of the fields. To select several elements simultaneously, use Ctrl-Click. The action performed will then be applied to all selected elements. To select elements, use the 1. tool. Text entry fields are already automatically marked on the form.
2. The second method is by removing the marking of a standard field. Select the necessary fields on the image or in the list and from the context menu select Delete Region. The marking will be removed and the name of the field will be marked with a red asterisk. To create a region on the image for a field without marking, select the tool from the toolbar and frame the necessary region.
4.3.2.5. Fields with multiple instances Your documents may contain repetitive objects – fields or field groups that occur several times in a document and describe similar information, for example similar detail about employees, children, or invoices. To process such objects, you can create fields with multiple instances. Any field can have multiple instances, and these instances can be spaced any distance apart, even on different pages. Field instances possess identical properties.
4.3.2.7. Deleting fields To delete a field, select this field and press the Delete key or select Delete from the context menu. If you want to delete only the region of the field but to retain the field in the document structure, press Shift+Delete or select Delete Region from the context menu for the field. 4.3.3. Static elements Static elements mark objects from which data are not extracted. Such elements are used for template matching and document identification only.
Five anchors and one barcode are sufficient to unambiguously identify and match the template, unless you plan to process other documents with an identical arrangement of anchors in the same stream. 4.3.3.1. Peculiarities of barcodes If a barcode is used as an identifier, then it is an anchor barcode and therefore a static element. Such barcodes must be created in Static elements mode.
• Index field. Select this option if you plan to use this field for document indexing. If you do so, the value of the field in each document in the list will be indexed, and the operator will be able to use the value of this field for sorting and searching documents. Figure 6. The General tab of the Properties dialog box (for a text entry field) When you create fields, they are automatically assigned names corresponding to the explanatory text. Please ensure that you name the fields you created correctly.
4.3.4.2.1. Data types of a text entry field It is very important that the data type for text fields is specified correctly. Specifying the data type tells the program what kind of data is expected in the field: digits, or letters of a certain alphabet, or characters from a certain set, a date, etc. The program has a flexible mechanism for specifying data types. The user is provided with a ready-made set of data types that includes the most common types.
Figure 7. The Data Type tab of the Properties dialog box (for a text entry field) For any data type, the program can automatically process the entered values: remove excess spaces, change letters to uppercase or lowercase, or automatically replace specified characters or text fragments. To enable automatic processing of values, click the Edit... button next to the AutoCorrect options field. In the dialog box that opens, select the required automatic processing options (Figure 8). Figure 8.
You can also set up a checking procedure for recognized values to check if they belong to a certain interval. To specify an interval, click Edit … next to the Validation field (Figure 9). Figure 9. Value checks dialog box Specify a data type for each text field. For the First and last name field, you must select the Name type and specify the correct language. For the Processing volume field where the number of pages is indicated, select the Number type (the format is an integer).
Figure 10. The Data Type tab of the Properties dialog box (for a checkmark which does not belong to a group) 4.3.4.2.3. Data types for a checkmark group On the Data Type tab of the Properties dialog box for a checkmark group, you can see a list of all checkmarks included in this group (Figure 11). Select Allow empty selection, if at least one checkmark field in the group must be checked. If multiple checkmarks in the group can be selected, you must select the Allow multiple selection option.
Figure 11. The Data Type tab of the Properties dialog box (for a checkmark group) Specify properties for checkmarks and checkmark groups. Select the /Empty method of checkmark values conversion. For the “Types of documents to be processed” checkmark group, select the Allow empty selection and Allow multiple selection options. Specify the value to be exported if no checkmark field is selected, for example, “none selected”.
to specify any other recognition properties because the field will not be recognized and the verification operator will be prompted to enter the value of this field manually. Select the text type: ICR (handprinted) or OCR (printed). For printed text, select the print type (typographic, matrix printer, typewriter, etc.). To specify multiple text types or use a template, select the Advanced option and click Modify… Select the marking type using the marking samples from the drop-down list.
Figure 12. The Recognition tab of the Properties dialog box (for a text entry field) 1. Specify recognition properties for all text fields on your form. The text type for all fields must be ICR (hand-printed), marking type – Char box series, number of cells must be detected automatically, and the text orientation must be Horizontal. There are no multi-line fields on the form, so the One line option must be selected for all fields.
comparing the image of the checkmark on a processed document against the image of the blank form used to create the template. You can allow corrections for certain checkmarks. If the person filling out the questionnaire has selected a checkmark by mistake, he or she can just erase this checkmark. Checkmarks that are completely erased will be considered not selected. If, however, you selected the Auto type for a checkmark, no corrections are allowed.
4.3.4.3.3. Barcode recognition properties The properties used to recognize a field barcode are similar to those for a static barcode. For a field barcode, you must specify the barcode type, orientation, and image despeckling options. The only difference is that the operator can enter the value of the field manually. To do so, select the Don't recognize (Key From Image field - will be entered manually) option. 4.3.4.3.4.
Figure 14. The Verification tab of the Properties dialog box 1. Set up verification options for the data fields. The “Verify uncertainly recognized characters” value is selected by default, which means that uncertainly recognized characters in a field will be sent for verification. Keep this value for all fields except, for example, the First and last name field and select All values for this field.
To specify a resolution value for a picture, select Change resolution to and then select the desired value from the list. 4.3.4.6. Rule-based checks Rules are used to check recognition results automatically. Rules, like data types, allow you to specify data constraints. i.e. specify the requirements that values of certain fields must meet. If the values in filled-out documents do not meet these requirements, such pages are marked with a flag and a corresponding message.
Figure 15. The Rules tab of the Properties dialog box Let us analyze a sample rule. We will describe the fill-in date of the questionnaire as the merging of the Day, Month, and Year fields. To do this, perform the following: 1. Delete the region of the date field on the form image but do not delete the field itself. To do this, select Delete Region on the context menu of the Fill-in date field.
9. Add the Day, Month, and Year fields to the Fields list by clicking Add. In the Result field field, select the Fillin date field. Use dots as separators. We set up the check if the date in the Fill-in date field lies within the specified time interval. The check will now be performed for the value received by merging the values of three fields. This completes the step for creating fields and static elements on the form and for specifying their properties.
during scanning. If this is the case, you only need to change the order of pages and the requirement of the assembly rule will be met. To add a page to a section, from the Document Template Editor menu select Template > Add Page…, or from the context menu of the image, select Add Page… . Next, load the image of the new blank page and select types of objects that must be detected automatically on the page.
You can also specify the sequence and number of reiteration of the sections by selecting Template > Document Template Properties… in the Template Editor window. On the Assembly tab (Figure 17), specify the minimum and maximum number of reiteration of the sections in the document (1 by default). If you wish to check values of key fields, select the Check equality of key fields and specify the key field on each of the pages.
Annex pages are additional pages that may by included in any document. They do not contain any recognition fields and you do not have to match a template to them. However, they are taken into account when assembling documents. For example, an application for credit is a fixed form. A certificate from the workplace written in a free-form style is attached to the application. This certificate may be processed as an annex page.
the selected format. In this case you can specify a recognition language: either keep the language specified in the template or select one or more languages from the list (Select button). If you wish to change the resolution of initial images, for example to reduce the size of stored data, select the Change resolution to option and enter a new resolution. Figure 18. Setting up export to files in the Export Setting dialog box 4.3.7.2.
Figure 19. Setting up export to a database in the Export Settings dialog box Now specify to which tables and table columns of the database the field values of the document are to be exported. To do so, click Setup Fields Mapping... The left-hand part of the Field Mapping dialog box (Figure 20) contains a list of document sections and fields. In the right-hand section, specify the corresponding tables and fields of the database.
Figure 20. Setting up links between document fields and database fields during export Parameters concerning the saving of images can be specified on the Images tab of the Export Settings dialog box. You can save images to a database or to a file (in which case you will need to specify the relevant folder). Select the format in which images are to be stored.
2. The SharePoint columns into which date are to be written must be of type Single line of text or Multiple lines of text. To set up export to SharePoint, select Export to SharePoint in the Export type field (Figure 21). On the SharePoint Connection tab, type the address of the server (server URL) where your SharePoint libraries are located. Use the Connection settings… button to set up authentication parameters (Windows logon parameters are used by default) and select Proxy settings if required.
4.3.7.4. Custom export This export type allows you to set up advanced export procedures using tools that are not available in the program interface. If you wish to set up script-based export, select Custom export (script) in the Export type field (Figure 22). Next, select the scripting language (JScript® or VBScript) and enter the script text in the editor window that opens when you click Edit Script… (For a detailed description and samples of using scripts, please see the program Help file.) Figure 22.
4.3.8. Setting up the recognized data view Once the data are recognized, the user will see them in the document window. By default, the data are sorted by order and the labels correspond to the template field names. You can however change the view of the recognized data and make it more convenient, for example by changing the order of data presentation or by adding an explanatory text. The recognized data view can be changed in the bottom right-hand side of the Document Template Editor window.
4.4. Setting up image import The operator’s first job is to add new images to the project. These images can be paper documents (these must be scanned) or electronic images. If images are regularly received from one and the same source, you can automate the image-adding procedure so that all necessary actions are performed automatically with just a single click.
4. The Import Profile Creation Wizard now prompts you to set up options for purging the Hot Folder following import. Images that were successfully imported and images whose processing produced an error can be deleted or moved to another folder. 5. Finally, change the name assigned by default to the import profile and enter a description. Figure 23. The Image Import Profiles dialog box 1. Set up an import profile for your images. To do so, select Project > Image Import Profiles in the ABBYY FlexiCapture 8.
You can attach a flexible description at the stage of document template creation. To do this, add the document image during the second stage of template creation. Next, select the Load FlexiLayout option and enter the path to the AFL file containing the flexible description. You can also attach a flexible description in the Document Template Editor window. To do so, use the Properties dialog box of a section.
enable the option Images separated by and select the value blank pages or pages with barcode from the dropdown list, depending on which pages are to be used as separators. Pages are assembled into documents automatically: pages will be added to the current document until the next separator page. 7. Working with a Set-Up Project Once the administrator has set up a template and specified all the necessary settings, you can start processing documents.
When importing images from multi-page files, multiple pages will be added to the batch. 2. Scan images. To add images from a scanner, from the menu select Scan Images... You will be prompted to select a scanner and scan the images. 3. Import images with the help of an image import profile already created by the administrator (see. Setting up image import) If import profiles have been set up, their names will be displayed in the menu of the import button.
appear for the page and for the document to which the page belongs, and field recognition is performed. If none of the templates of the project can be matched with the page, the page remains unprocessed. In most cases a correctly created template matches with pages automatically. However, sometimes you may need to select a template manually. To match a template, select the necessary page or document and select Match Template… from the drop-down menu of the recognition button.
Figure 24. The program main window, page thumbnails mode To begin verifying the recognized data, click Run Verification… Group verification means grouping character images which have been recognized as having an identical value and displaying them on the verification screen in order to confirm correctly recognized characters and leave for the next stage only those characters which are either incorrect or uncertain (Figure 25).
During template creation, you set up verification options when specifying the field properties. Group verification is performed for characters from fields for which you have selected the Include in group verification option on the Verification tab of the Properties dialog box. Figure 25. Group verification of digits Context verification is a verification mode used to correct the format of fields whose value range is known or easily identified.
Figure 26. The field verification window Document window also allows you to check that recognition was correct and to correct erroneous characters (Figure 27). The document window opens when you double-click on the name of the page. This window consists of the data area, page image, and rule errors area (if there are such errors). You can set up the arrangement of windows with the help of the Layout button.
Figure 27. Document window Rules check. Rules whose requirements are not met are marked with either a yellow flag (warning) or a red flag (error). If a rule relates to one of the fields, such a field must be sent to the verification operator during context verification. Rule errors are displayed in a separate window of the document editor, and documents which do not meet rule requirements are indicated by red flags.
8. Conclusion This simple example covered all the stages of program set-up and processing structured documents. The capabilities of the program, however, are much greater. It can help you process simple and complex multi-page documents of various types: semi-structured, non-structured, and mixed-type documents. If you have any questions, please refer to the program Help files and to the Installation Guide. © 2009 ABBYY. All rights reserved.