Getting Started with Ascent Xtrata Pro Version 3.
Copyright Copyright © 2002-2006 LCI GmbH. All right reserved. Printed in USA. Portions, Copyright 2006 Kofax Image Products, Inc. All Rights Reserved. The information contained in this document is the property of LCI GmbH. Neither receipt nor possession hereof confers or transfers any right to reproduce or disclose any part of the contents hereof, without the prior written consent of LCI GmbH. No patent liability is assumed, however, with respect to the use of the information contained herein.
An attempt has been made to state all allowable values where applicable throughout this document. Any values or parameters used beyond those stated may have unpredictable results.
iv
Contents How to Use This Guide ..........................................................................................................xv Introduction .......................................................................................................................... xv How This Guide is Organized ............................................................................................ xv Related Documentation ......................................................................................
Contents Validation ................................................................................................................11 Invoice Processing.................................................................................................................12 Special Invoice Processing Technology..............................................................................13 Knowledge Bases ....................................................................................................
Contents Concept of Classification ..................................................................................................... 51 Classification Engines and Learning by Example............................................................ 53 Definition of Classes and the Class Tree ........................................................................... 54 Adding Classes....................................................................................................... 54 Class Hierarchy ....
Contents Locators ................................................................................................................................118 Basic Concept of Locators....................................................................................118 Managing Locators ...............................................................................................121 Exporting and Importing Locators.....................................................................122 Locator Methods ..........
Contents Concept.................................................................................................................. 153 Setting Up a Database.......................................................................................... 153 Using the Database Locator ................................................................................ 156 Speed Considerations .......................................................................................... 159 Format Locator..................
Contents Properties...............................................................................................................211 Set Up Validation .................................................................................................................215 Introduction .........................................................................................................................215 Setting Up Validation ..................................................................................
Contents User Interface Elements..................................................................................................... 251 Initial View............................................................................................................ 251 Project Panel.......................................................................................................... 260 Project Panel for Invoice Projects .......................................................................
Contents Validation Methods Properties Dialog Boxes ...................................................365 View Table for Field Dialog Box.........................................................................381 View Properties Dialog Box ................................................................................381 Write Protection Password Dialog Box .............................................................386 Zone Locator Zone Settings Dialog Box .............................................
Contents Batch Class Considerations ............................................................................................... 519 Synchronizing Projects ........................................................................................ 519 Recognition Server ............................................................................................... 519 Publishing Batch Classes.....................................................................................
Contents Validate a Document............................................................................................570 Batches with No Invalid Documents .................................................................571 Batch Editing .........................................................................................................572 Show Field Contents in Batch Tree ....................................................................576 Online Learning .......................................
How to Use This Guide Introduction This guide contains information about using Ascent Xtrata Pro. It is provided for system administrators, operators, project developers, and other personnel who are setting up and using Ascent Xtrata Pro components for use with Ascent Capture. This guide assumes that you have a thorough understanding of Windows standards and interfaces, and Ascent Capture.
• Chapter 7 – Setting Up a Batch Class in Ascent Capture explains how to add Ascent Xtrata Pro components to Ascent Capture batch classes and use the Synchronization tool to synchronize the project classes and fields with Ascent Capture. • Chapter 8 – Processing Batches describes the general operation of Ascent Xtrata Pro Server and provides information about its user interface. • Chapter 9 – Ascent Xtrata Pro Validation describes the general operation of the Ascent Xtrata Pro Validation module.
How to Use This Guide Scripting Online Help Information about scripting is available from the Help menu of any Project Builder interface that allows you to write or access scripts. Select Help and then the desired help component. Ascent Xtrata Pro Release Notes Late-breaking product information is available from the release notes. You should read the release notes carefully, as they contain information that may not be included in other Ascent Xtrata Pro documentation.
• Scanner engine (board) type • Special/custom configuration or integration information xviii Ascent Xtrata Pro User's Guide
How to Use This Guide Ascent Xtrata Pro User's Guide xix
Chapter 1 Overview Introduction This chapter introduces the components installed with Ascent Xtrata Pro, as well as their key features. The rest of this guide describes these components in more detail, and explains how to incorporate Ascent Xtrata Pro into your Ascent Capture processing flow. Ascent Xtrata Pro Ascent Xtrata Pro is a complete system for processing structured, semi-structured, and unstructured documents within the Ascent Capture framework.
Chapter 1 • Ascent Xtrata Pro Server processes batches in the Ascent Capture workflow by performing document classification and data extraction. The Server module uses the definitions stored in a project and executes them when processing batches for a linked batch class. • Ascent Xtrata Pro Validation provides enhanced validation functionality. It allows for validating and manually correcting documents that contain invalid classification and/or extraction results.
Overview Ascent Xtrata Pro Server Figure 1-1. Typical Capture Workflow with Ascent Xtrata Pro Server and Validation First, documents are prepared for scanning. There is no need to sort the documents, but the pages must be smoothed and all staples and/or clips removed. Then, using a professional scanner with VRS, batches of documents are scanned into Ascent Capture. Ascent Xtrata Pro Server processes the documents and provides the classification and recognition results.
Chapter 1 A project created with Project Builder is stored in its own project folder. The folder includes the project file and a number of additional files that contain everything needed to manage and execute the project. This project folder is portable; if desired, it can be copied to another location and used from there. Project Builder supports robust features for interactively testing project settings during configuration and maintenance.
Overview Figure 1-2. Classification Result Matrix for a News Group Project of Nine Classes Ascent Xtrata Pro Synchronization Once classes and fields are defined in the Ascent Xtrata Pro project, they must be mapped to Ascent Capture document classes, form types, and index fields. Ascent Capture document classes, form types, and index fields can be set up in Ascent Capture as usual.
Chapter 1 Ascent Xtrata Pro Knowledge Base Administration Once a project is set up, the Knowledge Base Administration module is used to train the project, as well as manage training sets and knowledge bases. For complete information on this application, refer to the Using the Ascent Xtrata Pro Knowledge Base Administration Module guide that is included with your product. Ascent Xtrata Pro Server Ascent Xtrata Pro Server is a custom module that performs document classification, OCR, and data extraction.
Overview The Server collects statistical data on all documents as they are processed and saves this information in the XDocument (XDoc). A release script retrieves the data from the XDoc and stores it in a database. The statistics are also updated based on changes that occur during validation. The Server collects the following statistics: • Number of pages/documents per day/month. • Recognition rates (correct, reject, error) per field and per document. • Processing time per page.
Chapter 1 Classification Classification is the process of determining the category (class) of a document by identifying its relevant characteristics. The features used for classifying a document can be geometrical or textual. The Ascent Xtrata Pro classification engine can use either of these characteristics to make the best determination. Classification Hierarchy In most organizations, the manual classification of documents follows a hierarchical scheme.
Overview Instruction Classification Instruction classification uses explicit rules about a document to classify it. These rules consist of words and phrases that can be combined using Boolean operations. Negative instructions can be used to inhibit placing a document into a class. When used in conjunction with the AFC, these explicit instructions can be used to handle exceptions. Document Separation Ascent Xtrata Pro is capable of separating multi-page .
Chapter 1 Evaluators In addition to the locators, various evaluators are available. Evaluators work on the results of locators and do not directly retrieve data from the document. Online Learning The New Samples working mode is available within Project Builder. This working mode shows documents that have been returned from validation. These documents can be added to either a classification or extraction training set so that they may optimize the extraction of tables and invoice header locators.
Overview Script Integration A VBA-compatible script engine is built into Ascent Xtrata Pro. This engine can be used to extend the capabilities of the classification, extraction, and validation methods. The script is called when specific events occur before and after classification. In the scripting environment, the complete Ascent Xtrata Pro object document model is available to the script programmer.
Chapter 1 Optionally, custom validation forms can be designed for the Ascent Xtrata Pro Validation module. For more information, see Setting up Validation. Validation Methods and Rules Validation methods include the implementation of automatic check functions, which can be predefined standard methods or customer-specific methods developed with the integrated scripting feature. Validation rules are used to assign validation methods to one or more fields.
Overview • Additional fees and tolls These fields are read by a pre-trained system that can already recognize a certain percentage of invoices. Since additional information is created during the data extraction process, this information can be used to improve the recognition of invoice data through additional training. In addition to the preconfigured items, fields can be added to an invoice project specifically for the extraction of additional information.
Chapter 1 information is stored in the knowledge base, and the training document contents are not available and cannot be displayed from the knowledge base. Knowledge bases can either be created with the help of the Project Builder or the Knowledge Base Administration module.
Overview Ascent Xtrata Pro is designed to read semi-structured invoices. Therefore every project has a set of predefined fields for the most common items found on all types of invoices. These fields are almost always logically arranged on the invoice, and each field has one of the group locators assigned to it. Each group locator takes advantage of existing knowledge about the geometry of these groups, and uses that knowledge to improve data extraction.
Chapter 1 16 Ascent Xtrata Pro User's Guide
Chapter 2 Project Builder Introduction Project Builder lets you set up, store, and test projects for Ascent Xtrata Pro that contain all the necessary information for processing documents. In Ascent Xtrata Pro there are three main aspects to setting up a project: classification, extraction, and validation. You may define projects that contain only classification, with no extraction or validation. However, projects that contain validation must also contain classification and extraction.
Chapter 2 License Activation The Ascent Xtrata Pro setup will install the Project Builder with a demo license. The demo license is valid for three days from the date of installation. The Project Builder can be used without any restrictions until the license expires. After the expiration date, the Project Builder will not work except for the license activation component. Until activation is complete, Project Builder will display a dialog box asking the user to activate the license.
Project Builder Activating a License To activate a license, the user has to activate either a time/volume restricted Ascent Product Suite Evaluation license or an unrestricted Ascent Xtrata Pro license on the local machine. License activation is performed within a simple dialog box, as described below. 1 During the demo period, the user is asked to activate the license each time the application starts. You can continue starting Project Builder without activating the license by clicking No.
Chapter 2 Figure 2-1. License Activation The Activate License dialog box has two panels, Current License that shows the information for the currently activated license and New License that allows entering the name and company for the new license and shows the dates of the attached hardware key. Both panels provide the following fields: • Name - for the current license, the name of the licensee is displayed; for a new license activation, the name of the licensee must be inserted.
Project Builder The following buttons are provided: • Read Hardware Key – Reads the hardware key information from the attached hardware key. • Activate License – Click Activate License to check if an Ascent Hardware key is attached to the local computer by calling the Ascent Capture licensing functions. If not, the user is prompted to attach the hardware key • Cancel – Click Cancel to close the License Activation dialog box. • Help – Click Help to open the online Help topics.
Chapter 2 Layout Classifier The Layout Classifier analyzes the graphical representation of the document image and automatically creates classes of similar documents. Training documents are needed to enable layout classification for a class. The representations of these training documents are used to train the classifier. For detailed information, see Layout Classifier on page 43.
Project Builder For invoice projects, there are special field group locators for predefined invoice fields, which only need to be trained with sample documents. These locators can also be combined with the normal “rule-based” locators. Extraction Benchmark You can test the extraction results for the current project settings against a reference set. The reference set has to be created first, for example by processing the documents with Ascent Xtrata Pro Server and Validation.
Chapter 2 Creating a new Project There are two ways to create a new project: • Create a project from a directory: With this method, you specify folder(s) during the project creation process that contain image and/or text files to use as classification training sets. (You must set up the training set folders before you create the project.) Any subfolders that exist for the folder(s) are used for creating classes and training sets.
Project Builder Figure 2-3. New Project Dialog Box – Project Folder Tab 3 If the root folder exists already and you want to overwrite it, select “Delete existing files” to delete all previously existing files and folders in the selected folder when the project is created. This might be useful for reusing an existing folder for which you do not need any of the existing files or folders. Note Review the contents of an existing folder before deleting its contents.
Chapter 2 Figure 2-4. New Project Dialog Box – Content Classification Tab 5 If you want to use an existing set of files for content classification, select “Import existing training set for content classification.” Then, specify the folder that contains the text files and subfolders to be used for the creation of classes and training documents. You can enter the path in the Path field or browse for the folder. 6 Click Next to continue to the next tab. Figure 2-5.
Project Builder X 7 If you want to use an existing set of files for layout classification, select “Import existing training set for layout classification.” Then, specify the folder that contains the image files and subfolders to be used for the creation of classes and training documents. You can enter the path in the Path field or browse for the folder. 8 Click Finish to create the project and close the dialog box.
Chapter 2 8 Set up validation. For more information, see Setting Up Validation on page 48. 9 Save the project. Loading an Existing Project When you load an existing project, it will automatically be validated. . If necessary, a warning message will describe any issues that were found during the project validation process. This warning may also be displayed, if you select File | Validate Project from the main menu. If no problems are detected, “No problems are found in this project” is displayed.
Project Builder Figure 2-6. Save Project As Dialog Box To change the name of the project file select the Name text box and enter a name for the new project file. Click the folder icon to navigate to a different folder and click OK. The text at the bottom of the dialog box shows the new project file name and the complete path to its new location. Project Properties The Project Properties dialog box allows you to insert a project description or assign read and/or write protection to the project file.
Chapter 2 password, the project will open in full edit mode once you provide the read protection password. Figure 2-7. Open Read Protected Project File If the project file is write protected, you have to enter the write protection password and click OK to open the project file for editing. If you click Cancel, the project will open in read only mode. Figure 2-8. Open Write Protected Project File The following table shows the relationship between the read and write passwords.
Project Builder mode Set Correct Not set N/A Opens if full edit mode Set Wrong Not set N/A Does not open Set Cancel Not set N/A Does not open Set Correct Set Correct Opens in full edit mode Set Wrong Set N/A Does not open Set Cancel Set N/A Does not open Set Correct Set Wrong Does not open Set Correct Set Cancel Opens in read only mode . Project Settings Project-level settings are set from the Project Settings dialog box, which includes the tabs described below.
Chapter 2 Profiles Use this tab to define the OCR or OMR Bar code profiles, to import or export profiles, and to change profile settings. In general three different types of profiles can be created: • Page • Zone OCR • Zone OMR Each profile has properties for defining languages, as well as settings for orientation, background removal, separation characters, and printer types. For details, see Project Builder User Interface - Project Settings Dialog Box – OCR Tab .
Project Builder Knowledge Base This tab is used to manage knowledge bases in order to create new knowledge bases and to import, export, and encrypt a knowledge base. For details, see Project Builder User Interface - Project Settings Dialog Box –Knowledge Base Tab. Testing and Optimizing a Project When you test or optimize a project you have to distinguish between standard and invoice projects.
Chapter 2 it must be corrected by the user to conform to the new settings for the current table locator. • Misc. Warnings - shows malfunctions or missing definitions. For example if a locator uses a dictionary, but the dictionary is not available. Check Licensed Features with Current Project You can check the project against the current license by selecting File | License Utility from the main menu. A dialog box summarizing the licensing status for the project will display.
Project Builder Optimize Project To optimize a project, you can: • Test classification for a selected document using one of the following methods: 3 Select Process | Classify Document from the main menu. 3 Click Classify Document from the main toolbar. 3 Press F5. • Test classification for the selected test folder using one of the following methods: 3 Select Process | Classify Folder from the main menu. 3 Click Classify Folder from the toolbar. 3 Press Ctrl + F5.
Chapter 2 • Add locators or change properties for locators for a class. For more information about adding and working with locators, see Extraction. • You can test the fields and locators and their settings. If you make changes to the training set, you must retrain the project: 3 Select Process| Train Project from the main menu. 3 Click Train Project from the toolbar. Invoice Projects The following sections describe the steps that have to be taken to create a new invoice project.
Project Builder Figure 2-11. New Invoice Project Dialog Box – Project Folder Tab 3 From the Tax Model tab, select the type of tax model that you want to use. If you are using a European VAT model, you can enter individual tax rates. Figure 2-12.
Chapter 2 Default Settings By default a set of formatters and validation rules is added to an invoice project. If you select the ”Show project details” option, the Setup Invoice Project dialog box will display so you can change the settings for the date, amount formatter, existing validation rules, or import knowledge bases. The Setup Invoice Project dialog box shows the default settings on three tabs.
Project Builder Figure 2-14. Setup Invoice Project Dialog Box – Validation Tab Knowledge Base Select the Knowledge Base tab and click Import to open the Import Knowledge Base dialog box in order to import knowledge bases. To import knowledge bases later, open the Project Settings – Knowledge Base tab and click import. Figure 2-15.
Chapter 2 Upgrading an Invoice Project from an Earlier Version To upgrade an invoice project from an earlier version you have to open it with the normal “Open Project” menu command. The updated project must be saved in a different folder. A special dialog requests you to specify a new location for saving the project. The original project will not be modified. While loading, the project is validated and automatically upgraded to the new version.
Project Builder 5 Train all the fields on the document by selecting a field on the left and then selecting the corresponding data on the document image. 6 Click Add to Training Folder. The document will be saved to the default training folder and will appear in the list of files in the Training Set (Extraction) panel. If desired, you can add additional training folders to better organize your training sets.
Chapter 2 2 Click the Templates label on the navigation panel to switch to the template working mode. 3 From the menu select Project | Add Template. A new template is created in the list of templates. 4 Use the context menu of the new template to rename it. To be able to classify documents using this template, you have to specify one or more sample documents. X To add sample documents to a template 1 Select the appropriate template in the list of the templates.
Project Builder Note You cannot train the “Credits” or “Currency” properties in the Amount Group locator. For problem invoices, you can define templates. Templates are not needed for the extraction process, but can help improve extraction quality for difficult or unusual invoice layouts. For all fields on the document that work correctly, you can use the definitions from the training set. For fields that have failed, you can change the field settings or define additional fields in the template.
Chapter 2 c. Select the desired documents and drag them to the class in the hierarchy in the Project panel. Note When you train Layout Classifier for invoice projects, you can not use drag-and-drop method, instead select the document and click “Add to Training Set of selected class” from the toolbar.
Project Builder Tip If you are adding samples from the Test Folder, you can select the desired document and click the “Add to Training Set of Selected Class” button from the toolbar, rather than using the drag-and-drop method. d. Select “Use for Content Classification” from the context menu. 3 Train the project by selecting Process | Train Project from the main menu or clicking Train Project from the main toolbar.
Chapter 2 Setting Up Extraction The following section describes the general steps for setting up extraction. For details about fields and locators, see Extraction. Adding Fields and Locators You can define fields at the project level, for which extraction is performed at the beginning of classification. The extraction results for these fields may be used to classify a document.
Project Builder c Select the desired properties for the locator by right-clicking the locator and selecting Locator Properties. For more details about locators, see Extraction. Note You can create fields and locators in any order, but you must create the locator before you can assign it to a field. 5 Assign a locator to a field. First, select a field from the list of fields. Then, expand the drop-down list of locators and select one.
Chapter 2 Testing Document Separation To test document separation, open a folder containing test documents and click Test Document Separation from the main toolbar. After processing, a dialog box displays showing the document separation results based on the project settings and the class properties. Figure 2-16. Document Separation Results Setting Up Validation The following procedure describes the general steps for setting up validation. For details, see Setting up Validation.
Project Builder 2 In the field properties dialog box, edit the options for the defined fields. Validation thresholds for valid fields must be set, and if necessary, the “Require manual field confirmation” option enabled. 3 Create validation methods. a. Select Project | Project Settings from the main menu bar to display the Project Settings dialog box. b. Select the Validation tab. c. Click Add to display the New Validation Method dialog box. d.
Chapter 2 a. In the Project panel, right-click on a class and select Validation Form. The Validation design dialog box will display, showing the new default validation form for the selected class. b. Customize the form as desired by adding or removing elements. c. Test the validation form for different screen resolutions to check whether the fields fit. For example, select Size | 800 x 600 to display the form for that resolution. d. Define the desired script events.
Chapter 3 Classification Introduction Ascent Xtrata Pro automatically classifies documents based on format, content, and the subsequent extraction of items. Classification is performed in the first processing step, separately from extraction. However, the classification results may subsequently be changed based on the extraction results. Ascent Xtrata Pro features a full framework of classification technologies that can be used together in a flat structure or in a hierarchy.
Chapter 3 A typical document may contain a brief letter (one or two pages) describing the reason for sending the document, plus an arbitrary number of additional attachments. For such documents, it is usually sufficient to classify only the letter since the attachments may not contain the information required to detect the correct class. The classification algorithm used by Ascent Xtrata Pro makes this assumption by default. It is also possible to define different classification behaviors.
Classification Classification Engines and Learning by Example The classification algorithms in Ascent Xtrata Pro can be used as classification engines. That means that they are implemented such a way that they can easily be replaced, and depending on the licensing an engine may or may not be available. The following classification engines are available: • Layout Classifier: Performs image-based classification on the image using only graphical elements.
Chapter 3 classification. The key to setting up a project with sample documents is to select the appropriate samples and design an appropriate classification scheme. Additionally, the ability of a project to learn by example makes it much easier to maintain. The primary maintenance task becomes one of adding additional sample documents or removing unsatisfactory ones.
Classification X To insert a new child class 1 Right-click the desired parent class in the hierarchy to display a context menu for the class. 2 From the context menu, select Add Class to add a new class beneath the parent. A default class name is added in edit mode, allowing you to easily rename the class. 3 Change the class name to something meaningful and press Enter. The new child class is placed into the class hierarchy in alphabetical order. Note Class names must be unique inside the project.
Chapter 3 Class icon shown when a class is just added to the class hierarchy. Class icon shown when a class is not a valid classification result. Default class icon. Class icon shown when this class redirects all documents to another defined class. Class icon shown when subtree classification is enabled for the class. Class Properties The following properties are available for a selected class.
Classification Figure 3-3. Class Properties Dialog Box General The general options are used to specify that a class can serve as a classification result, to make the class visible in the Ascent Xtrata Pro Validation form, and to specify that the class can be processed by with the Ascent Capture Recognition Server.
Chapter 3 Prohibiting the class from becoming the classification result might be useful for classes that are inserted as base classes for the sole purpose of defining common fields and common extraction methods. If a class meets the classification criteria but is prohibited from becoming the classification result, its parent (if there is one) will be used as the classification result. If there is no parent, the document will not be classified.
Classification For the purposes of subtree classification, you can set different confidence and distance values, which makes it possible to get more highly differentiated classification results than possible with a single classification step. Typically, for the first classification step you would use either adaptive feature classification or layout classification. Instruction classification is normally the best choice for subtree classification.
Chapter 3 Batches may contain single page or multi page documents, or a combination of both, or loose pages. Document separation processes multi page documents to split them to separate documents according to the settings, if necessary. If document separation is activated then all loose pages of a batch are added to one multi page document that will be processed by document separation.
Classification classifying them and regardless if they would belong to another class and after the third page is added, the current document is closed; it contains three pages now. The next page of the multi page document is processed until all pages of the multi page document are processed. If the value is set to zero and a page of a processed multi page document is classified to this class, then a new document is created and the page is added.
Chapter 3 OCR You can select different OCR profiles for each class. By default the default profile is selected. Click the OCR Profiles button to open the – Profiles tab of the Project Settings dialog box. Click Profile Settings to display or edit the settings of the currently selected profile. Classification Options Multipage Evaluation For documents containing more than one page, it is quite important to specify how single pages should be processed inside a document.
Classification Figure 3-4. Project Settings Dialog Box – Classification Tab Classification Settings Default classification result This option specifies the class to be used if a classification result cannot be determined. Select the desired default class from the list. Automatic evaluation This is the default option. The specified values for confidence and distance are used to evaluate the classification result.
Chapter 3 Content Classifier Classify only first page When this option is enabled, only the first page of a document is classified. Classify each page When this option is enabled, every page of a document is classified. Classify all pages at once If this option is checked, the text of all pages is merged and classified. Do not use content classification If this option is checked, the Content Classifier is not used.
Classification Hierarchical Evaluation and Other Classification Rules The evaluation of classification results is primarily based on the minimum confidence and distance defined in the project settings. But, if the class hierarchy contains hierarchical elements, a set of hierarchical evaluation rules is automatically applied to the classification result. This might result in a classification that does not have the highest confidence.
Chapter 3 Extraction design and validation rules are available when the project item in the class tree is selected. Single Child Wins Over Parent This rule is applied if a parent and only a single child have a confidence higher than the global threshold. For this special case, the child is preferred over the parent, regardless of which one has the higher confidence.
Classification The figure above shows an example for this rule. Politik is the parent of Energiepolitik. Both have a classification confidence higher than the global threshold of 50%, and the parent has the highest confidence. Due to the “Single child wins over parent” rule, Energiepolitik becomes the final classification result.
Chapter 3 parent, becomes the classification result and is given the maximum confidence from among the children. Note You can avoid invoking this evaluation rule if you don’t select “Valid classification result” in the Class Properties dialog box for Politik. If you do this the document will be unclassified since Politik is prevented from becoming a classification result. Local Not-Flag The Local Not-Flag is a special result of the Instruction Classifier.
Classification Propagated Not-Flag This rule is similar to the Local Not-Flag but the flag setting propagates to the child classes. If instructions are found on a document and the sum of their relevancies are less than -50 % (negative instructions), then the class is excluded from the classification results and all child classes are also excluded. This means that it is possible to disable the classification of an entire subtree by defining negative instructions at the root of that branch. Figure 3-8.
Chapter 3 Note If a classification rule has been applied to a document, a special icon is displayed next to it inside the classification results pane. A tool tip for the icon explains the applicable rule. Subtree Classification The subtree classification rule enables iterative classification inside a subtree using different threshold values for each level. To use this rule, “Enable subtree classification” must be selected in the Class Properties dialog box of the parent.
Classification The above example shows that Politik has the highest confidence, and as such, would normally become the classification result after the first step. But, Politik also has the subtree classification option enabled with a threshold of 30% for the minimum confidence and 5% for the minimum distance settings. Due to this lower value, Energiepolitik, with 40% confidence, becomes the final classification result.
Chapter 3 Figure 3-10. Class Properties Dialog Box – Subtree Classification 3 Select “Enable subtree classification” and modify the confidence and distance thresholds as appropriate. 4 Click OK to save your settings and close the dialog box. The icon next to the class in the hierarchy will change to indicate that subtree classification is enabled. Redirection The redirection rule forces a classification result to be replaced with some other class.
Classification X To configure redirection 1 Right-click the class item in the hierarchy where you want to configure redirection. 2 From the context menu, select Class Properties. The Class Properties dialog box will display. 3 Select the desired class from the list in the Redirection area. 4 Click OK to save your settings and close the dialog box. The icon next to the class in the hierarchy will change to indicate that a redirection has been applied. Figure 3-11.
Chapter 3 does not succeed or if the target system cannot deal with unclassified documents. Furthermore, unclassified documents will automatically be sent to the Ascent Capture Quality Control module for special handling. You can define a default class to avoid such situations. The default class is indicated by a special folder icon. X To define a default classification result 1 Right-click Project in the hierarchy. 2 From the context menu, select Project Settings.
Classification 5 Click OK to save your settings and close the dialog box. The icon next to the class in the hierarchy will change to indicate that it will be used as the default class. Layout Classifier Concept and Application Layout classification makes use of the geometrical structure of a document to determine its class. Ascent Xtrata Pro can automatically learn about the geometrical structure of a class by analyzing a number of example documents that are representative of that class.
Chapter 3 To train the classifier, select Process | Train Project from the main menu bar, or click Train Project from the toolbar. A progress bar showing the current status is displayed while training is performed. X To add documents to a training set 1 Select a class in the hierarchy. 2 Use Windows Explorer or select a reference set (a test folder or the Selection List) to open a folder that contains the image files that you want to add to the training set.
Classification 5 If the message “Do you want to add image classification support to this project” displays, click Yes. (The message only displays the first time you specify layout classification for the project.) The documents will be added to the training set for the current class. Training sets can be easily managed at any time. New sample images can be added and existing sample images can be viewed or deleted. X To view documents in a training set 1 Select the class in the hierarchy.
Chapter 3 2 Select the document that you want to delete and click Delete Selected Document from the toolbar. Or, right-click the document and select Delete Selected Document from the context menu. To delete all documents, select the Delete All Documents button or context menu option. 3 When the message “Delete the selected document from training set” displays, click Yes to confirm the operation.
Classification Figure 3-15. Layout Classifier Properties Dialog Box – Advanced Settings Optimize Classification for Invoices If this option is selected, the classifier will analyze only the upper and lower parts of the document. The remainder of the document is not used for classification. This is especially useful for invoices, which often have a preprinted header and footer area. It might also apply for other types of business documents that have a similar structure.
Chapter 3 Training Max samples per class The Layout Classifier supports an unlimited number of samples per class. If the sample images are very different, the Layout Classifier internally learns different patterns for each sample. For performance reasons, you might want to limit the number of sample documents that are used for feature extraction. A value of 0 means no limitation.
Classification Figure 3-16. Image Clustering Properties Image source Select the directory with the image files you want to be organized into clusters. The specified directory tree will be searched recursively for files with a .tif extension. Algorithm options Threshold for clustering This threshold controls if a document is assigned to an existing cluster or if it is assigned to a new cluster. A higher value causes more clusters, but the clusters will be smaller in size.
Chapter 3 Selection list. With a value of 2 or higher, some images will not be displayed if they are not put into an existing cluster. You can use this, for example, if you want to hide documents that appear only once inside the directory. X To use image clustering for creating a training set 1 From the main menu bar, select Tools | Image Clustering. The Image clustering wizard dialog box will display. 2 Select a directory 3 Adjust the various settings as desired. 4 Click Start.
Classification Figure 3-18. Clustering Results Displayed in Selection List 6 If desired, you can now add the clustered images to the training set used for layout classification. First, select the desired class from the hierarchy. 7 Select one or more images from the clusters inside the Selection list. 8 Click “Add to Training Set of selected Class” from the Selection list toolbar and select the “Use for Layout Classification” option.
Chapter 3 Adaptive Feature Classifier Concept The Adaptive Feature Classifier (AFC) is a content-based classifier that uses the text in a document to identify the class. The AFC is trained by having it analyze several dozen sample text or XDoc documents per class. It automatically and adaptively determines the salient features that can be used to define a class.
Classification Note If you are adding samples from the Test Folder or Selection list, you can select the desired file and click the “Add to training set of selected class” button, rather than using the drag-and-drop method. 4 Select “Use for content classification” from the context menu. 5 If the message “Do you want to add text classification support to this project?” displays, click Yes. (The message only displays the first time you select content classification for the project.
Chapter 3 Note You must retrain the project before any changes to the training set will affect the Adaptive Feature Classifier. Properties The behavior of the AFC can be configured with a properties dialog box. X To display the AFC properties 1 From the main menu bar, select Project | Project Settings. The Project Settings dialog box will display. 2 Select the Views tab. A list of all used classifiers inside the project is displayed. 3 Select the “Adaptive Feature Classifier” and click Properties.
Classification Figure 3-19. Adaptive Feature Classifier Properties Dialog Box Text Filtering Use digits This option controls whether the classifier should use digits as features or ignore them during text filtering. Min. word length All words that are shorter than this value are ignored during text filtering. Independently of the word length, features with a very low or high frequency will also not be taken into account. Training Max.
Chapter 3 Max. feature length Specifies the maximum number of characters that should be used for a feature. Should not be larger than 64 characters. Min. feature frequency Specifies how often a substring must appear inside the training set of a class to be used as a feature for content classification. Start features at beginning of word Specifies that a feature substring must always start at the beginning of a word. If not checked, the substring can start anywhere. Max.
Classification Thresholds, Precision, and Recall The overall quality of the classification process can be expressed by precision and recall. The classification of a document, when compared with a reference set, can lead to one of three results: • Correct classification • Incorrect classification (also known as a false positive or substitution) • No classification (or rejects) A threshold allows for the suppression of all classification results below a certain confidence level.
Chapter 3 Figure 3-20. Relationship Between Precision and Recall - Scheme The yellow area depicts the set of all documents. The vertical reference line divides this set of documents into two groups: class A or not A. The classifier performs classification and decides if a document belongs to class A or not. This is depicted by the diagonal line. If the classifier and the reference set were perfect, the vertical line and the diagonal line would exactly match.
Classification Auto Optimization The Auto Optimization tool can be used to optimize the parameters for the Adaptive Feature Classifier (AFC). The optimization tool requires a test set to test the parameters. The test set must have a directory structure that is identical to the class hierarchy. The directory names must match the class names and are case sensitive. The test documents should be available inside the appropriate class folders as text files (*.txt).
Chapter 3 3 Click Start to start the optimization process. During optimization, each parameter is modified and tested. Optimization ends when no further improvement can be reached, or if the classification result reaches 100%. Note You can click Stop at any time to terminate the optimization process. Any improvements for the classification quality up to that point are saved. 4 Click Close.
Classification Result Matrix To provide a complete and detailed analysis of classification quality, Project Builder allows you to test the classification against a reference set. A reference set is a set of test documents (different from the training set documents) that has a directory structure identical to the class hierarchy. If the reference set does not have the same structure as the class hierarchy, then you will not get useful statistical results.
Chapter 3 Figure 3-23. Result Matrix for Classification Note If there is more than one classifier defined for the project, you can select the classifier in the Selected View list before starting the calculation. The upper part of the window displays statistics for the calculated results. • • • • • • • 94 Count: Total number of documents classified. Correct: Number of documents classified correctly. Incorrect: Number of documents classified to the wrong class.
Classification Statistics for each class are displayed when Statistics is selected from the toolbar. It is also possible to save these results in text file format. The Min. Confidence and Min. Distance sliders allow you to interactively modify both thresholds after the result matrix has been calculated. Any changes you make are immediately reflected in the matrix. This allows you to optimize the precision or recall value by adjusting the confidence and distance threshold.
Chapter 3 Figure 3-24. Open folder with reference set Dialog Box 2 Select the root folder of your reference set. The reference set must have a directory structure that is identical to the class hierarchy. The directory names must match the class names and are case sensitive. The test documents should be inside the appropriate class folders as text files (*.txt), image files (*.tif), or as XDoc (*.xdc) files. Note If you select text documents, then content classification is performed.
Classification using Boolean operations. Negative instructions can be used to inhibit classification into a class. Set Up Classification instructions are unique to and managed in each class. To display the instructions for a class, simply select the class in the hierarchy and change the view mode to Classification Design Figure 3-25. Instruction Classifier - Classification Design X To insert a new instruction 1 Select the appropriate class.
Chapter 3 3 Click Add instructions from the Classification Design panel toolbar. The Instruction Properties dialog box will display. Figure 3-26. Instruction Properties Dialog Box 4 Enter a new instruction in the edit box inside the list of phrases. If the edit field is not visible, click the “Adds a new phrase to the instruction” icon. 5 Adjust the relevance of the instruction to the desired value.
Classification you to drag-and-drop the phrase from the Document Viewer into the Classification Design panel and create a new instruction with it. X To insert a new instruction with a drag-and-drop operation 1 Make sure the appropriate class is selected in the hierarchy. 2 Make sure the Classification Design panel is visible. (To make it visible, select View | Show Classification Design from main menu bar.) 3 Open a document for which OCR has already been performed, or open a text document.
Chapter 3 Figure 3-27. Instruction Classifier – AND and NOT Relationship Instructions can be deleted from a class by clicking Delete Instruction from the toolbar in the Classification Design panel. To display the properties of the currently selected instruction, you can either click Properties on the toolbar or double-click the instruction. Instructions can be exported and imported for re-use in other projects.
Classification X To import instructions into a class 1 Make sure the appropriate class is selected in the hierarchy. 2 Make sure the Classification Design panel is visible. (To make it visible, select View | Show Classification Design from main menu bar.) 3 Click Import from the toolbar in the Classification Design panel. The Import instructions dialog box opens. 4 Select a file name that contains previously exported instructions. Click Open.
Chapter 3 Testing Content Classification Content classification can be tested and analyzed using the functionality provided by the Document Viewer. A context menu in the text display of the viewer allows classifying selected text and each line of the current page. X X 102 To classify the selected text in the Document Viewer 1 Open the Document Viewer for a document. 2 Click Show Text to switch to the text viewer. 3 Select the text that should be classified.
Classification Figure 3-28. Classify Lines – Classification Results Managing Views Any classifier instance inside the project is called a view. When you add training documents for the Layout or Adaptive Feature Classifier or instructions for the Instruction Classifier for the first time, the corresponding view is created automatically. If you want to use the Layout or Adaptive Feature Classifier with different settings, you have to add a new view and change its properties.
Chapter 3 Because the view for the Instruction Classifier has no additional properties, there is no need to create additional views for it. Warning If you delete the Instruction Classifier view, the Instruction Classifier and all its instructions are removed from the project. It is possible to access and manage the views directly from the Project Settings dialog box. Figure 3-29. Project Settings Dialog Box – Views Tab Click Add to manually insert a new view into the project.
Classification Figure 3-30. Add Classification View Dialog Box View name: Enter the name of the new classifier view here. Classifier type: Use the list to select the desired classifier type here. After the classifier type is selected, the text below shows the type of documents that are processed by that classifier. Existing training set • Import existing training set: Select this option if you want to import an existing training set for the classifier.
Chapter 3 Management of training set • Copy files to view's training set path: This is the default mode. All files in the training set will be copied into the project directory. • Only keep reference to files (no copy): The files will not be copied into the project folder. It will not be possible to move the entire project to another computer, because the training set is referenced with an absolute path. It is not possible to add or delete documents from a referenced training set.
Chapter 4 Extraction Introduction One of the main purposes of Ascent Xtrata Pro is to extract data from documents. The extracted data is stored in fields. The fields and the extraction definitions are set up in Project Builder and synchronized with Ascent Capture index fields. For detailed information about the Synchronization tool, see Setup a Batch Class in Ascent Capture. This chapter describes how to create and edit fields and how to define extraction methods.
Chapter 4 Managing Fields A set of fields is associated with a class in Ascent Xtrata Pro. Their purpose is to store the data extracted for the class. Subclasses always inherit the fields of their parent class. Additional fields can be added and field settings can be changed for the subclasses, but no fields inherited from a parent class can be deleted. Fields are accessible in the Extraction Design panel in Project Builder.
Extraction X To add a table field 1 Select Show Extraction Design from the Mode toolbar. 2 Select a class from the project hierarchy. 3 Click Add Field from the Extraction Design toolbar. 4 Rename the field to a meaningful name as desired. 5 Change the field type to Table Field. To do so, select the row of the field that you want to specify as a table field, click the drop-down arrow next to the Field Type button in the Extraction Design toolbar, and select Table Field from the list.
Chapter 4 Note Field names must be unique for a class. If you enter a name that already exists, a message displays. Confidences For each field, a confidence value called the Locator Result Threshold can be used to govern the acceptance of results from a locator. A locator typically returns several items that match the locator definitions. These are called alternatives. Each alternative is evaluated and assigned a confidence by the locator.
Extraction The field’s valid state is set to True, when the minimum confidence value is reached and the distance to the second best alternative’s confidence has at minimum the selected value. The default value for the confidence is 80% and the default value for the distance is 10%. For example if the best result has a confidence of 81% and the next best alternative has a confidence of 75%, the best result will not be used even though it meets the minimum confidence value.
Chapter 4 Note When you overwrite an inherited locator locally in a subclass, you must make sure that the local locator provides the same subfields as the locator in the base class does. Otherwise it is possible that an inherited assignment to a field cannot be resolved anymore. This may then lead to an extraction error. Example: There is an advanced zone locator defined in a base class that creates two subfields S_A and S_B. They are assigned to the fields F_A and F_B respectively.
Extraction • Both the above properties are available for the Script Formatter. If the Boolean value of the field’s DoubleFormatted property is True, then the DoubleValue and/or DateValue properties can be used in scripts for calculations. This allows the user to work with field values without being concerned about reformatting the field text. Each formatter type has its own set of properties.
Chapter 4 Figure 4-5. New Field Formatter Dialog Box X 114 5 Enter a name in the Name field and select the formatter type. 6 Click OK to open the formatter’s properties dialog box. For more information, see Project Builder User Interface – General Dialog Boxes – New Field Formatter dialog box. To set up a Script Formatter 1 Define a new Script Formatter according to the instructions above. Use a name such as ‘ScriptFormatter’, remember not to use blanks and underscores in the name.
Extraction Figure 4-6. Script Formatter Properties Dialog Box 7 The script formatter can be used for the three different field types: text, date/amount or double value. Select the field type from the list in the Options panel. Sample script code is added to the text box in the Script Sample panel. 8 If you want to adjust the sample code, click Select followed by Copy to add a copy to the clipboard. 9 Click Show Script to open the Script Code dialog box.
Chapter 4 Figure 4-7. Script Formatter –Script Code Dialog Box 11 In the script, you can also reuse existing formatters. The sample below shows script code that takes the result of the Amount Formatter (AmountFormat) and adds an “€” sign at the end of the formatted output text. The Amount Formatter is set up so that only two decimal places after the decimal symbol are used.
Extraction 3 For simple fields, choose the desired formatter from the drop-down list in the Formatting area. Figure 4-8. Assigning a Formatter to a Simple Field For table fields, you must select the column first and then choose the formatter from the drop-down list of formatters.
Chapter 4 Figure 4-9. Assigning a Formatter to a Table Column 4 Click Close to save the settings and close the dialog box. Locators Locators are engines that implement algorithms for identifying items extracted from the document. A locator returns all alternatives in a list, sorted by their relevance. The list of locators contains an additional type, the evaluators. Evaluators, in contrast to locators, always work on the results of locators and do not retrieve the data from the document.
Extraction • Amount Group Locator • Invoice Group Locator • Order Group Locator Each group locator has its own property dialog where you can adjust specific parameters and settings. Unlike the standard locators, group locators have to be trained. This means that you need sample documents that are used to show the system where the fields are positioned on the document. Each trained sample document is then added to a training set, which is then used to train the entire project.
Chapter 4 extracts from these the correct values for typical invoice header data like invoice number, order date, total, and tax values. • OCR Voting Evaluator: Compares the result of zones character wise and selects the best result for each character to save to the field. • Relation Evaluator: Rates the results of one locator in comparison to the results of another locator based on the relative locations the results. • Script Locator: Uses custom script events to locate data.
Extraction Note To use the results of one locator in a subsequent locator, the processing order must reflect the intended usage pattern. The locator that is providing results must be processed first, and therefore must appear before the dependent locator in the locator list. To change the processing order, change to the Extraction Design mode and move the locator up or down in the list. To do so, select the locator and click the Move Locator Up or Move Locator Down buttons in the Locator toolbar.
Chapter 4 • Right-click the row for a locator and select Delete Locator from the context menu. • Select the row for a locator and click Delete Locator on the Extraction Design toolbar. When prompted to confirm the deletion, click Yes. To rename a locator: • Double-click the name of the locator to select the whole name. Then, type the new name in place. • Click the name of the locator and edit in place. Note Locator names must be unique for a class.
Extraction X To import a new locator 1 Right-click a locator and select Import New Locator from the context menu or click Import New Locator on the Extraction Design toolbar. The Import new locator dialog box displays. 2 Select the desired locator file and click Open. (Locator files have the extension “.loc”.) 3 The locator is added as new locator at the end of the list of available locators. X To import a locator method 1 Create a new locator and select it, or select an existing one.
Chapter 4 Figure 4-11. Locator Methods The standard locator and evaluator methods are available from the list. Assign Locators to Field Once a locator has been defined, it can be assigned to a field that receives the data. Figure 4-12. Assigning a Locator to a Field Just click the Locator column in the field list. A drop-down list displays all the available locators, from which you can select one.
Extraction Some locators, such as the Database Locator, provide more than one data item. For every field of a record present in a database, a corresponding subfield is available for the locator and can be assigned to an index field. Figure 4-13. Assigning a Locator to Extract Structured Data The above screen shot shows the selection of a subfield from a locator that extracts structured data. Instead of just the locator’s name, you find a list of [LocatorName].[SubField] entries.
Chapter 4 have a slightly lower confidence (for example, 92%), because the format matches only “11/08/20” of “11/08/2004”. Both alternatives will be returned, but the better one will be assigned to the field. Regions By default, locators will operate on the entire page for every page in a document. To speed up processing, you can define regions that restrict the locator to portions of a page or to certain pages.
Extraction Figure 4-14. Format Locator Properties Dialog Box – Regions Tab 5 Click Add to add a region to the list of regions. Then, change the properties for the region (such as Top, Left, etc.) as desired. To see the region on a document, select a document from the Test Folder or Training Set. The document displays in the Document Viewer. The regions you add display on the document. Instead of using the Add button, you can also draw the region directly on the document in the Document Viewer.
Chapter 4 Figure 4-15. Manually Drawing a Region in the Document Viewer 6 Specify whether the locator is for all pages or a specific page. You can make your selection from the “Enable locator for” area or the Page column for the locator. 7 Specify the desired Access setting. 8 Click Test to show the result for a selected test document. 9 Click Close to save the settings and to exit the dialog box.
Extraction X X To locate an item only on the first page 1 Open the Properties dialog box for the locator. 2 Select the Regions tab. 3 Make sure that All pages is not selected. 4 Select Enable locator for First page. To identify an item only in the lower 40% of the last page (e.g. to locate a banking account on an invoice) 1 Open the Properties dialog box for the locator. 2 Select the Regions tab. 3 Clear the check mark in All pages if it is selected 4 Select Enable locator for Last page.
Chapter 4 Figure 4-16. Format Locator Test Dialog Box The results on the Test Results tab are sorted by confidence. If a document is shown in the Document Viewer, the results are also highlighted in the viewer. By default, the first item in the list (the best result) is highlighted in green with other results highlighted in blue. If you select a result from the Test Results list, it becomes highlighted in green in the viewer, along with its confidence value.
Extraction Field Group Locators The following sections describe how to set up a field group locators. There are three types of field group locators: • Amount Group Locator • Invoice Group Locator • Order Group Locator Note Normally you use field group locators only for invoice projects. When you create a new invoice project, or open an existing one, the necessary fields will be created automatically and the user needs only set up the extraction training set.
Chapter 4 • NetAmount4 / TaxAmount4 / TaxRate4: These three fields form a group. The corresponding fields are filled with the fourth net amount, to which the fourth tax rate applied. • Total: Contains the sum of all net amounts plus taxes. • Postage: Contains the postage amount. • Packaging: Contains the packaging amount. • Discount: Contains any discounts that might be applied to the invoice.
Extraction You can define different training folders to group documents within the training set, for example French invoices, German invoices, or invoices with no tax rates. You can either add new documents to the training set or add those that have been returned by the Ascent Xtrata Pro Validation module. Unlike new documents, returned documents already contain OCR results and other information that is stored in the XDoc file.
Chapter 4 6 Add data to the fields by clicking the corresponding results in the document. You should train all the fields of a group locator if the information is available on the document. Note The documents are added to the default training folder of the selected class. To change the folder, or add a new one, you must change to the Training Set (Extraction) panel. Every project has a Default training folder that cannot be renamed or deleted.
Extraction 7 Select File | Validate Document from the main menu or click Validate Document in the toolbar to test the document and correct any issues displayed in the status bar. 8 If there are no errors, select File | Add to Training Folder or click Add to Training Folder from the toolbar. The Edit Document window will automatically close. 9 Select Process | Train Project from the main menu or click Train Project from toolbar to retrain the project with the newly added documents.
Chapter 4 for the purposes of testing. However, if an Ascent Xtrata Pro production license is present a password protected knowledge base can only be used within the Server module if the user has an Activation Code for it.’ Important Remember that protection can only be configured during the creation of the knowledge base and can not be added to an existing knowledge base.
Extraction 6 Insert the serial number of the hardware key and the activation code. 7 Click OK to activate the code. OCR and OMR Profiles The settings of OCR engines are saved in profiles. Three different types of profiles are available: • Page – profiles for full page recognition • Zone OCR – profiles for zonal character recognition • Zone OMR – profiles for zonal mark recognition A default profile is added for each type when a new project is created.
Chapter 4 OCR Substitution For amounts it happens quite often that OCR confuses characters and reads an ‘O’ instead of an ‘0’ or ‘I’ instead of ‘1.’ Especially for amounts it is useful to compensate such confused characters. Therefore you can define a list of predefined OCR substitutions and use these substitution to automatically exchange the found characters for a field as you know when you search for an amount, the amount can not contain a ‘O’, but only digits.
Extraction Figure 4-19. Sax Basic Script Editor Dialog Box Field formatting procedures must be defined at the project level. The general extraction script is used to implement the Script Locator, field extraction, and field validation methods. Apart from the general field extraction script, each class has its own field extraction script to provide class-specific processing.
Chapter 4 To implement a Script Locator, field extraction, or field validation methods that are executed for classes and fields, select the extraction script for the class. When you do this, the dialog box caption is set to – Script Code. Before you start to insert script code, make sure that you have selected the correct tab. The Object drop-down list shows all the objects for the current module. The Proc drop-down list shows all the procedures for the currently selected object.
Extraction Properties The following sections describe how to set up the Address Evaluator. X To add and configure an Address Evaluator 1 Select Show Extraction Design from the Mode toolbar. 2 Add a new Address Evaluator. Note Make sure that you have already defined a locator, whose results will be used by the Address Evaluator. This locator must appear above the evaluator in the list of available locators, since it needs to be executed first.
Chapter 4 Note Each database field must be mapped with some locator and at least one zip and one city field should be mapped with a valid locator field, as fuzzy address correction is based upon the zip and city. If any field except city or zip has a confidence lower than 70%, it will not be considered in calculating the overall confidence of the result. Each individual field’s confidence can be seen in the Field Extraction results.
Extraction a Select a sample document from the test folder or the training set and display it in the Zone Viewer. Note When the Advanced Zone Locator property dialog is opened, the standard viewer becomes the Zone Viewer. The Zone Viewer provides additional functionality to add, manipulate, and test recognition zones. b In the Advanced Zone Locator Properties dialog box General tab, click Insert Sample to insert the displayed image as reference image.
Chapter 4 b. If desired, rename a subfield by clicking the Name column. c. If desired, change the Result setting for a subfield by selecting All or Best from the drop-down list. 6 Click Close to save the settings and close the dialog box. 7 Assign the locator subfields to fields. X To configure and test a zone for the Advanced Zone Locator 1 If necessary, open the Zone Settings dialog box. This dialog box is displayed automatically after a new zone has been added on a reference page.
Extraction 2 In the General area, adjust the properties of the zone as necessary. This area shows the coordinates of the zones, and allows you to change them to exact pixel values. The name of the zone can be changed in the edit field. The page number can be set to specify the location of the zone in a multipage document. You can also set the rotation of the zone snippet in steps of 90°, in case the text on the document is rotated.
Chapter 4 displayed in the text box. Note that it may take a few moments for the results to appear the first time you test the zone. X To configure background removal for an Advanced Zone Locator 1 Open the Advanced Zone Locator properties dialog box to the General tab. 2 Select a sample document from the Test Folder or the training set. Double click to display the selected document in the Zone Viewer. 3 Insert at least four additional sample documents.
Extraction • -Zoom in, zoom out, fit to viewer, fit to width • - Select document set • - Help Barcode Locator The following sections describe the concept of the Barcode Locator and show how to add and set up the locator. Concept Some documents contain bar codes that are usually attached to a form during the scanning process. With the Barcode Locator any bar code present on the document is detected and decoded. The detected values can be assigned to a field.
Chapter 4 Classification Locator The following sections describe the concept of the Classification Locator and show how to add and set up the locator. Concept The Classification Locator uses the classification scheme defined in a secondary external Ascent Xtrata Pro classification project to provide additional classification results for a document as field values. Only the classification scheme is used from the external project. This offers new possibilities for data extraction.
Extraction items in an order. The class name will be the name of a product group (for example, furniture, paper products, and office supplies). Train the project with all the products from your product database in advance, assigning the products to the correct product group class in the training set. Save the project and assign it to a Classification Locator in the current project. The Classification Locator will assign the name of the product group class from the other project to a field.
Chapter 4 X To create a language Classification Locator 1 For an existing project, select any class in the project class hierarchy. 2 Select Show Extraction Design from the Mode toolbar. 3 Create a new locator named Language Locator and select Classification Locator from the drop-down list of locator methods. Figure 4-22. Adding a New Classification Locator 4 Double-click the new locator to open the Properties dialog box.
Extraction Figure 4-23. Extraction Results for a Classification Locator Note The sample language classification project included with Ascent Xtrata Pro supports English, German, and French. If you need additional languages, you can easily create your own language classification project from typical sample documents using the Ascent Xtrata Pro Project Builder. Database Evaluator The following sections describe the concept of the Database Evaluator and show how to add and set up the evaluator.
Chapter 4 Concept An evaluator that compares fields that are extracted with a zone locator to values in a database, and returns a list of found hits. Properties The following sections describe how to set up the Address Evaluator. X To add and configure an Database Evaluator 1 Select Show Extraction Design from the Mode toolbar. 2 Add a new Database Evaluator. Note Make sure that you have already defined a locator, whose results will be used by the Database Evaluator.
Extraction be reached. Depending on the validation threshold settings the extraction field status is set to valid or invalid. Database Locator The following sections describe the concept of the Database Locator and show how to add and set up the locator. Concept The Database Locator matches the document with any record in a database. Since the matching algorithm uses direct access to an index, all items in the documents can be compared with all database records in a reasonable amount of time.
Chapter 4 Figure 4-24. Project Settings Dialog Box – Database Tab Alternatively, you can open this dialog box from the properties dialog box of any locator that is based on the Database Locator. In the Database Locator properties dialog box, click Database Settings to open the Project Settings dialog box Database tab. 154 2 Click Add to include a new database in the project. 3 When prompted, enter a name for the new database and click OK. The Fuzzy Database Options dialog box will display.
Extraction Figure 4-25. Fuzzy Database Options Dialog Box 4 The Fuzzy Database Options dialog box allows you to select the referenced import file and make other settings. 5 Browse to the file you want to use as the database. 6 In the Import Options area, if the input file contains the column headers in the first line select “First line contains caption.” Select the fields you want to use.
Chapter 4 Note If the content of the import files changes then you need to re-import the file that the changes take effect for the project. If necessary active the option ‘Automatically update from import file,’ but then you need access to the referenced import file. X To add and configure a Database Locator 1 Select Show Extraction Design from the Mode toolbar. 2 Click Add Locator in the toolbar to add a new locator to the list of locators.
Extraction 7 Click the Database Settings button to display the Project Settings – Databases tab. 8 Click Add to insert a new database. 9 When prompted, enter a name for it. 10 Click OK. The Fuzzy Database Options dialog box will display. 11 Browse to the Referenced import file. 12 Check “First line contains caption” if the input file contains column headers in the first line. 13 Select only the fields that you expect to find on the document, in this example Company,Street, ZipCode, and City.
Chapter 4 Figure 4-27. Define Grouping for the Fields ‘ZipCode’ and ‘City’ 18 Go back to the Extraction Design panel and assign the locator subitems to the defined fields. Figure 4-28. Assigning Database Locator Field to CompanyName Field 19 Open a folder with the test documents, and select a document. 20 Click Extract in the Document Viewer. The results will be highlighted on the document as shown in the figure below.
Extraction Figure 4-29. Test Results for a Database Locator 21 If no results are displayed, either the threshold for the field or the internal locator threshold is too high. Open the Properties dialog box for the locator and change the threshold to a lower value. Click the Test button. If results are now displayed in the result list, you might want to lower the field threshold (accessible in the Field Properties dialog box) to an appropriate value.
Chapter 4 • Select appropriate fields: In the project’s database settings, select only the fields that are present on the documents. For example, your internal customer ID will usually not be used on the customer’s correspondence. • Load database to memory: Use the “Load database to memory” option if enough memory is available. By default, the database is loaded to memory. Format Locator The following sections describe the concept of the Format Locator and show how to add and set up the locator.
Extraction For a detailed description of the locator’s properties, see Project Builder User Interface – Format Locator Properties Dialog Box. Regular Expressions Regular expressions are used to recognize patterns within textual data. They evaluate text data and match an expression with the text in the document. In Ascent Xtrata Pro, regular expressions are used in the Format Locator to identify items in a document and return the value of a matching item.
Chapter 4 (e1|e2) Choice (abc|ABC) abc, ABC aBC, AbC You can add test values to see whether the inserted formats will match. The test results are shown in the “Matched parts” column in the list of formats. This column shows which parts of the test value match the format. For more details, see books on regular expressions. In many cases, however, extensive knowledge of regular expressions is not needed because Ascent Xtrata Pro provides a set of format templates for regular expressions.
Extraction Formats can be disabled using the check box in the first column of the list of formats. This option is especially useful for testing different formats without having to delete and re-enter the others: Figure 4-30. Enabling / Disabling Formats To save computation time, the results of one Format Locator can be reused in a second Format Locator.
Chapter 4 Figure 4-31. Reusing a Format Locator Note To reuse a locator, the original locator must be defined before the second. The drop-down list only gives access to locators that precede the current one in the list of locators. Format Templates Project Builder has a set of format templates with a variety of useful expressions. Figure 4-32. Extraction Format Templates The templates can be selected from a menu next to the Format edit field. To open the side menu, click the right-arrow.
Extraction advance which date formats to expect, just add them all. The locator will decide which dates on the document match the format best. Keywords In most cases, when a format is used, it matches several words in the document. For example, if a date on an invoice is searched, several occurrences are found: order date, amount due date, invoice date, delivery date etc. To further constrain and evaluate the results, search keywords are used.
Chapter 4 You can add keywords from the viewer by clicking the words on the document or add them by typing the words to the text field. You can also use predefined words by clicking the arrow next to the keyword field and selecting a dictionary. For each keyword, you can define various properties (weight, relation, and distance). If a dictionary is used, the “Search dictionary exact” option becomes available. Select this to force an exact match between found words and words in the dictionary.
Extraction In the above example, an eight-digit invoice number (Rechnungsnummer) was specified using a simple regular expression that also fits for other numeric values. Since no keyword is defined, the order number “65005285” was not identified exclusively. The result of the test is shown above. Figure 4-35.
Chapter 4 words are included in a date expression (as in “November 12, 2004”), the format is more difficult to define. Of course one could write a format expression for each month. But, if the number of possible words is prohibitively high (such as the name of a city next to a zip code), this is not a realistic approach. Simply using a regular expression “\[A-Za-z]+” (one or more alphabetic characters) matches all kinds of unwanted strings on your document. Dictionaries support UTF-8 Unicode.
Extraction Figure 4-36. Project Settings – Dictionary Tab Dialog Box The list contains all dictionaries that have already been referenced and imported. Click Add to add a new dictionary. Click Properties to display the Dictionary Options dialog box. For a detailed description of the Dictionary Options dialog box, see Project Builder User Interface – Add Dictionary – Dictionary Options Dialog Box. Dictionaries in Formats To include lists of words, such as cities, given names, months etc.
Chapter 4 that includes the name of the month, just provide a dictionary with the following entries: January February March … December This dictionary, and others, are installed to your Ascent Xtrata Pro installation in the Dict folder Using an appropriate date format, complex dates can be located with these and other dictionaries. Some of these formats are also available as predefined templates.
Extraction Figure 4-37. Use a Dictionary in a Format X To use a dictionary in a format 1 Open the Properties dialog box of a Format Locator. 2 Click the Template button and select Dictionaries | Dictionary_Name from the context menu.
Chapter 4 Figure 4-38. Insert a Dictionary into a Format 3 Edit and complete the regular expression by entering additional characters before or after the dictionary placeholder in the edit box. 4 Click Add or Modify to accept the changes and move the regular expression to the list of formats. Note If the dictionary file that is used when defining a format becomes unavailable to the project, the format using that dictionary will be ignored and the locator list will display a warning.
Extraction If used for keywords, the dictionary will behave as if you had manually entered a long list of keywords. All the optional settings will be applied to the words in the dictionary. Keyword dictionaries can be very useful. Consider the case where you want to extract an invoice date, but the keyword designating it can vary. Instead of listing all invoice date keywords individually, you can provide a dictionary containing them (“Invoice date”, “Inv.
Chapter 4 Figure 4-39. Using a Dictionary of German City Names to Locate a Zip Code When a dictionary is used for keywords, the option “Search dictionary exact” becomes available. Check this option to force an exact match with words in the dictionary. It is recommended that you use “search exact” if the dictionary is significantly larger than a simple keyword list. By default, the items in the dictionary are searched using fault tolerant string matching technology.
Extraction • Invoice Number • Invoice Date • Order Number • Order Date • as well as amount data: • Invoice Total • Tax Free Amount • Net Value (up to two, based on tax) • Tax Rate (up to two, based on tax) • Tax Amount (up to two, based on tax) • Currency The Invoice Header Locator is a locator that evaluates results from other locators. It picks the correct values from the results of these input locators.
Chapter 4 It is important that you do not restrict the locators themselves with keywords or regions. Just define the formats. This approach will yield as many results as possible, from which the Invoice Header Locator then will pick out the right combination. Taking the results from these four locators, the Invoice Header Locator picks the right combination of values as the final results.
Extraction For example, if the Invoice Field Group Locator returns a result for the invoice number that meets a defined confidence, then this result is used; otherwise the result for the invoice number extracted by the Invoice Header Locator is taken into account. Properties The following sections describe how to set up the Invoice Header Locator. X To add and configure an Invoice Header Locator 1 Select Show Extraction Design from the Mode toolbar. 2 Click Add Locator from the Locator toolbar.
Chapter 4 of old or different invoices. If this may be the case in your project, you should enable this option. The Invoice Header Locator then tries to find amounts as close as possible to the front of the document. If this option is not selected, amounts may be found at any point in the document. It is recommended that you not use this option unless it is absolutely necessary, since extraction quality might be somewhat degraded. Format Locators Figure 4-45.
Extraction Taxes Figure 4-46. Taxes Tab The Invoice Header Locator supports amount groups for up to four tax rates. There are two ways you can use this: • You can define up to two different tax rates (which is sufficient for countries like Germany where there are only two different value added tax rates) both of which can exist on the same invoice • Or you can search for up to four different tax rates, but then there can only be one used in an invoice at a time (Which is the case e.g.
Chapter 4 Currencies Figure 4-47. Settings Tab – Currencies panel Here you can define all valid abbreviations for currencies you expect to be used in the invoices of your project. For example, if you have German and American suppliers, and invoices from both are to be processed in the same project, you should define all valid abbreviations for the two currencies (US Dollars and Euros). Also, you can define a replacement value which will be used as the output value (result).
Extraction Highlighting Figure 4-48. Settings Tab – Highlighting panel For testing purposes, you can change the highlighting colors for the 13 subfields extracted by the Invoice Header Locator. Keywords Extraction of the header data works only with the keywords provided.
Chapter 4 Figure 4-49. Keywords for invoice numbers Keywords can be single words or entire phrases. Both are searched with fuzzy logic and are case-insensitive, so you don’t need to add keywords for all possible variations or errors. A keyword may be assigned a weight, which signifies how important or how “strong” a keyword is. For example you may define “invoice number” with a weight of 100% and “number” with 50%.
Extraction OCR Voting Evaluator The following sections describe the concept of the OCR Voting Evaluator and show how to add and set up the evaluator. An evaluator is a locator engine that works on the results of other locators. In the case of the OCR Voting Evaluator, the engine works on the results of one or several Advanced Zone Locators. Concept Zonal OCR in fixed zones on a document is accomplished with the Advanced Zone Locator, which reads the content of a zone character by character.
Chapter 4 6 Select the zones that should be used for voting. The dialog displays a list with three columns. The first columns displays the subfields in the locator that are used as first input, the second column contains the reference to the second input field. The third column contains the field name that is given to the subfield of the evaluator. Figure 4-50.
Extraction results may differ from engine to engine. Therefore the thresholds can be set individually per locator field to allow fine tuning. 11 Add as many subfields as you want by clicking the “Add” button. Relation Evaluator The following sections describe the concept of the Relation Evaluator and show how to add and set up the evaluator. An evaluator is a locator engine that works on the results of other locators.
Chapter 4 Figure 4-51. Relation Evaluator Properties - Settings Tab 5 On the Settings tab, specify which two locators are to be used for the relationship analysis 6 Specify how many alternatives should be returned.
Extraction • Combine original confidence with distance: The original confidence is multiplied by value indicating the distance of the input locators alternative to the other locator’s best alternative. 8 If desired, set a maximum allowable distance in which a result must be found. 9 Close the dialog box to save your settings. Script Locator The following sections describe the concept of the Script Locator and show how to add and set up the locator.
Chapter 4 Use the Regions tab to enable processing for all pages or to restrict it to the first, middle or last page. 5 Click Show Script to open the Script Code dialog box and enter program code for the “MyScriptLocator_LocateAlternatives” event. 6 Click Test to test the script. The results display on the Test Results tab of the Script Locator properties dialog box, which automatically opens. Figure 4-52.
Extraction Remember that the sequence of the selected locators is very important, as the results of the single locators are processed according to this order. Especially when you use the option “First of” the first result that meets the minimal confidence is taken. To change the order of the selected locators use the up and down arrow buttons beside the list of Selected Locators. Properties The following sections describe how to set up the Standard Evaluator.
Chapter 4 Table Locator The following sections describe the concept of the Table Locator and show how to add and set up the locator. Concept The Table Locator finds data that appears on a document in the form of a table. One Table Locator can find one table model. Table models are defined in the project settings. No matter what the actual table in the document looks like, it is always mapped to the current table model. Most tables are found automatically, especially invoice tables1.
Extraction Other columns can be manually added to the pool of globally available columns. The 12 predefined columns cannot be deleted. They are highlighted in yellow to distinguish them from manually defined columns. Figure 4-53. Project Settings – Tables – Global Pool panel A global column has an English name by default. More names can be added for different languages. X To create a new global column with a name in English and German 1 Open the Project Settings dialog box and select the Tables tab.
Chapter 4 Figure 4-54. Adding new names fpr a global column Table models are defined in the Project Settings dialog box. A table model defines the column structure of a certain type of table. For example, an order table can be defined with columns such as Article Code, Quantity, Discount, and Unit Price. Each table model is a subset of the column pool. Columns can be used in more than one table model, e.g.
Extraction Figure 4-55. Adding Table Models in the Projects Settings Dialog Box 2 Click Add New Table Model from the Table Models panel toolbar. A dialog box will display asking for the name of the table model. 3 Enter the name of the new model. 4 Click OK. The Properties of Table Model dialog box will display. Note If you need to open this dialog box at a later time, you can double-click a table model in the list or click the Properties button in the Table Models toolbar.
Chapter 4 Figure 4-56. Defining Table Models 5 Add columns to your model using the Assign button 6 Remove columns from your model by using the Unassign button 7 Select a column on each side and click the Swap button to swap the columns. This is important if you do not want to lose the settings you have made to the column in the Table Locator. 8 Use the arrow buttons to rearrange your columns to the most logical order. For each column used by the model, you can define a default value.
Extraction • Which keywords are used in a certain language for a certain column. Another purpose is to define the possible keywords for each column in the model, e.g. in the English language the Quantity column might be identified by the keywords quantity, quant., qty etc. whereas in German it is identified by Menge, Bestellmenge, Anz. X To add a language package to the project 1 Open the Project Settings dialog box and select the Tables tab.
Chapter 4 Figure 4-58. Defining language packages Training Header Lines for a Language Package For details on how to train the language package to recognize the header rows, select the “How to do it” tab. In the beginning, when the language package is new and empty, there are no header lines trained. So the first step is to show the system the header row(s). Once trained, the header line will appear with a green highlight on the document image.
Extraction Figure 4-59. A classifier finds the header lines Once you have defined the header row of the first invoice, move on to the next invoice. Either the header line of that invoice will already be recognized and shown in green, or you need to show the system that line as well. As you do this, the language package will become more adept at recognizing the table header rows. It is suggested that you use at least 100 different invoices, until all header rows are recognized automatically. Figure 4-60.
Chapter 4 those languages, to guarantee that the same type of table is found for different languages. For each language a language pack is generated that may be re-used for other projects. So, in a next step after the header lines are properly recognized in all sample invoices, you define all possible keywords for a certain column and for all languages you are generating the language package for. Figure 4-61.
Extraction 3 You can either select words from the header line in the snippet (a menu appears where you can select the column) or you can simply type the keywords in the list at the top. 4 Use semicolons to separate keywords. 5 Navigate to the next document, and repeat until all possible keywords for all columns are listed.
Chapter 4 Figure 4-62. Export a language package 4 X Select a location to save the language package. The filename for the language package export file is predefined as .llp To import a language package 1 Open the Project Settings dialog box and select the Tables tab. 2 Click Import from the Language Packages toolbar. The Import Language package dialog box displays. 3 Select a language package file (.llp) to import.
Extraction Bundled Language Packages There are several language packages already installed on the system. They can be found in the Program folder next to the Project Builder application in Project Builder\LanguagePackages. Setting up Table Locator Once a table model has been defined, you can set up the Table Locator. Each Table Locator must be associated with one table model. No matter what the actual table in the document looks like, the table model defines the table structure that the locator extracts.
Chapter 4 X To select a table model 1 Select Show Extraction Design from the Mode toolbar. 2 Click Add Locator from the Locator toolbar. 3 Select Table Locator as the locator method in the second column of the list. 4 Double-click the new locator or use the Locator Properties button in the toolbar to display the Table Locator Properties dialog box. 5 Decide what kind of table this locator is supposed to find and select an appropriate model from the drop down list.
Extraction The following types of columns can be automatically found and identified by this method: Position, Description, Quantity, Discount Rate, Unit Price, Unit Measure and Total Price. Additional columns can be identified based on the header keywords provided in the language package. X To set up the automatic mode 1 Select a table model. 2 Choose Automatic model. 3 Select a format locator from the list. 4 Choose a language packages. 5 Click Test to check your results. Figure 4-64.
Chapter 4 Manual With this method, all data that is structured as a table can be extracted. This method should be used when the automatic method does not work. A sample document is required during definition; the sample is not used during runtime. Once the sample document is selected, the user has to select one sample line item from the table, and in that line item the cells that contain the desired data. Figure 4-65. Setting up manual mode To use this mode, select Manual and provide a sample document.
Extraction Click “Use current” from the Table Locator Properties dialog box to insert the selected document as a sample document. Once a method is selected, the corresponding tabs on the Table Locator Properties dialog box become available: • With automatic mode selected, only the Settings and Test Results tabs are enabled. • With manual mode selected, all the tabs are enabled. for the Master Item and Cells tabs are used to define the line item, optional rows, and the data cells.
Chapter 4 Manual Mode The manual mode is activated by selecting the Manual radio button in the Selection detection method area of the Table Locator properties dialog box Settings tab and adding a sample document. To define an optional row in a table, the line is selected in the table viewer for the sample document by drawing a rectangle around one table row. When you release the mouse button, the rectangle snaps to the table row.
Extraction Figure 4-67. Defining Optional Rows Defining Cells Cells can be defined by simply selecting areas in the line item. The cells can be located anywhere within the line item, even below each other. Cells may also span across more than one row. Each cell is assigned to a column in the table model. A cell may be marked as optional, which means that it may be missing in a line item. The cell type is determined automatically when the cell is drawn, but it can be changed later.
Chapter 4 Figure 4-68. Defining Cells What you are doing by drawing the cells in the line item is creating a template (consisting of the blue rectangles). This template is then passed over the document line by line. Every time it aligns with a pattern of data, a line item is identified. As you can imagine, if your table model contains only two columns, you can only draw two blue rectangles. Such a template may fit many lines on your document.
Extraction Figure 4-69. Using anchors to enhance the template Order Numbers In combined invoices (single invoices that cover several orders), line items might be grouped by order (or delivery note) numbers which are interleaved in the table. If you have such invoices, search for these order numbers with a format locator and select this locator in the General settings section of the Table Locator property dialog box. Figure 4-70.
Chapter 4 Only results from that locator with a confidence higher than the threshold will be used. Furthermore, specify whether the grouping values are order or delivery note numbers. The numbers will then be copied into the appropriate column of the table. Zone Locator The following sections describe the concept of the Zone Locator and show how to add and set up the locator.
Extraction Testing the locator then shows following results: Figure 4-72. Extraction Result for ‘InvoiceNo’ with Background Removal Scanned documents are always slightly different (for example shifted by a number of pixels). The zones that are defined for the Zone Locator must be defined for a specified document. This document is called a reference document.
Chapter 4 box. Modify the settings as desired. You can also select a different OCR profile or insert a new one. Click Close to save the settings and close the OCR Profile Settings dialog box. Note By default, an OCR profile named Default is used. If you have several zones for the same Zone Locator and these zones need different OCR settings, than you must create at least one new OCR profile that contains different settings. 4 To perform a recognition test, click Test on the Zone Settings dialog box.
Extraction 3 Open the Properties dialog box to insert additional sample documents. a Select a sample document from the Test Folder or the Training Set. The Zone Viewer dialog box displays showing the selected document. b Return to the Zone Locator Properties dialog box and click Insert Sample from the General tab to insert the document as a new sample document. For background removal, you must add at least four more sample documents. 4 Click Create Info to process the background data.
Chapter 4 214 Ascent Xtrata Pro User's Guide
Chapter 5 Set Up Validation Introduction Ascent Xtrata Pro Validation provides enhanced validation functionality. Document validation ensures that all document fields contain valid data with respect to the actual user requirements. The validation process considers information from the automatic extraction algorithm, the required field formatting settings, any available automatic validation rules, and input from the interactive user interface.
Chapter 5 Step 1 Set Up Classification and Extraction Classification and extraction methods are configured and tested for a project in the Ascent Xtrata Pro Project Builder. Once the project is saved, it can be synchronized with an Ascent Capture batch class that includes Ascent Xtrata Pro Server in its workflow. (Synchronization can be performed before or after you add Ascent Xtrata Pro Validation to the batch class.
Set Up Validation Figure 5-1. Workflow Including Ascent Xtrata Pro Validation The above figure shows the general sequence of events during the validation process. The input data comes from the automatic extraction step together with a confidence rating (either confident or not confident). Next, any relevant field formatting rules are applied to the data. This formatting step may change the field text and return an attribute that indicates whether formatting has succeeded or failed.
Chapter 5 Field Properties You can use the Validation Thresholds area in the field properties dialog box to define the validation thresholds for each field. The thresholds are used to determine whether a field is valid or invalid. Another validation property determines whether a field requires manual confirmation. Fields for which the “Require manual field confirmation” option is checked must be validated manually in Ascent Xtrata Pro Validation, even if the field’s status is valid.
Set Up Validation Note Do not use formatters to correct OCR results at this state of processing (for example, to exchange a capital “I” with a number “1” within an amount field). This should be corrected during extraction, because a field formatter cannot change the state of an invalid field to valid. The following formatter types are provided: • Amount Formatter Unifies and validates amount formats on the document. • Date Formatter Unifies and validates date formats on the document.
Chapter 5 • Multi Field Script Validation • Invoice Field Validation You can create new validation methods by using one of these predefined methods as a starting point. X To define a new validation method 1 Select Project | Project Settings from the main menu bar to display the Project Settings dialog box. 2 Select the Validation tab. 3 Click Add to display the New Validation Method dialog box. Figure 5-2. New Validation Method Dialog Box 220 4 Enter a name for the method and select the type.
Set Up Validation Validation Rules Validation rules are used to assign validation methods to one or more fields. One or more rules can be assigned to each field. • Single Field Validation Rules ( ) • Single Table Field Validation Rules ( • Multi Field Validation Rules ( ) ) • Multi Table Field Validation Rules ( ) To define a single field or single table field validation rule, select a single field (either field or table field) first and then add a single field validation rule.
Chapter 5 Figure 5-3. Single Field Validation Rule Dialog Box 5 Click Add and select a validation method from the drop-down list. The list for single field validation rules includes standard validation, date validation, and single field script validation methods. You can add several validation methods to one validation rule. To change the processing order, use the Up Arrow and Down Arrow buttons. Figure 5-4.
Set Up Validation X To define a new multi-field (table) validation rule 1 Select the class. Classification and extraction must already be set up for the class. 2 Select Show Validation Design from the Mode toolbar. 3 Select a field (either a normal field or a table field). 4 Click “Add multi-field rule” from the toolbar to display the Multi Field Validation Rule dialog box. 5 Select a validation method from the drop-down list.
Chapter 5 6 To create a multi-table field validation rule, activate the “Validation rule works on all rows of this table” option and select a table model from the drop-down list. . Figure 5-6. Select Table for Multi Field Validation Rule 7 Map the fields or table fields to the fields of the validation method. The available fields are inserted automatically in the drop-down list on the left column of the field mapping table.
Set Up Validation Sequence of Validation Rules First, all single-field validation rules are processed (including rules that are inherited from the parent document class). Then, all table validation rules for the table are processed row by row. Finally, all multi-field validation rules are processed. A validation rule is ignored if the dependent field formatter or a previously processed rule has failed. Validation rules can be inherited from parent classes or passed on to derived classes.
Chapter 5 View | Choose Details | Validation Form from the main toolbar to enable this column.) To add a validation form to a class, select the class from the Project panel and select “Validation Form” from the context menu. A default validation form is added containing all fields defined for the class. The validation form can then be customized. X To create a new validation form 1 Right-click on a document class to open its context menu.
Set Up Validation description. By default, all document fields are inserted with a label, a viewer, and the actual field. The table field, seen above, is only a placeholder that does not contain headers, or the correct number of columns and rows. Since the table changes from document to document, the correct table will be displayed during validation or when testing validation. 3 Customize the form as desired by adding or removing elements. Note No controls can be located under a table.
Chapter 5 b Select the validation event from the drop-down list. c Select the fields or controls for which the script code is required. Appropriate code will be generated based on your selections. d Click Select All and then Copy. Figure 5-11. Script Wizard for ButtonClicked Event e Click Show Script to open the Script Code dialog box. f Select Edit | Paste from the context menu to insert the sample code. You can then edit the code as needed.
Set Up Validation 6 Select Window | Close to save the form. Force Field Valid Use CTRL+Enter force a field to be valid. If you do not want to allow this, you can switch this feature off by setting “Allow force valid” in the General Properties dialog box to “FALSE.“ Character Exact Editing For fields that are extracted with an Advanced Zone Locator, “character exact editing” is possible.
Chapter 5 Use this option to set a field’s status to ‘valid’, though the status returned from extraction is ‘invalid’. Then you do not need to validate this field or table column in validation. Important For fields that are not shown on a validation form, or those that are invisible or read only, you should enable this option. Then Ascent Xtrata Pro Server sets the field’s status to ‘valid’ and if all other fields of the document are ‘valid’ the document does not need to be displayed in Validation.
Set Up Validation The validation test does not provide the complete functionality found in the Ascent Xtrata Pro Validation module. • No batch management – since you are validating single documents 3 you cannot browse the documents in a batch or folder 3 you cannot edit a batch • Reduced interface elements for the main menu, context menus and toolbar. • You cannot test online Learning. • You cannot Save.
Chapter 5 toolbar. Invalid fields are marked with a blue question mark ( ) and valid fields with a green check mark ( ). 7 Select Process | Validate Document from the main menu or click Validate Document from the main toolbar or use the short cut F8. 8 The validation form for the processed document is displayed showing the extracted values. Edit the form as needed.
Set Up Validation Figure 5-12. Script Wizard for ButtonClicked Event d Click Show Script to open the Script Code dialog box. e Select Edit | Paste from the context menu to insert the sample code. You can then edit the code as needed. f Close the Script Code dialog box. 4 Close the Script Wizard dialog box. 5 Select Window | Close to save the form. Validation Design User Interface The following sections describe the user interface elements of the validation design tool.
Chapter 5 User Interface Elements Menu Bar The Validation Design Panel supports a standard, Windows-style menu bar. Figure 5-13. Menu Bar Toolbar The toolbar provides shortcuts to many menu items and gives you quick access to all important features. Buttons Description Add Field. When you click the down arrow, a list of all available fields in the class is shown. Choose a field from the list to insert it on the form. To change the font for the field, click Font Settings from the toolbar. Add Viewer.
Set Up Validation Font Settings. Click the button to change the font settings. Standard Layout. Applies the standard layout. Clear. Deletes all elements on the form. Delete. Removes the selected items from the form. Alternatively you can select the elements that you want to delete and select Delete from the context menu. Add Group Click this button to insert a group box on the form. Alternatively you can select the elements that you want to group and select Group Selection from the context menu.
Chapter 5 Form Elements The following validation form elements are available and can be added to a validation form. When you select an item on the form then a properties panel is shown on the right that allows you to set various settings. For further details about the validation form and form element properties see General Properties. • Fields Fields can be displayed as text fields, combo boxes, or check boxes.
Set Up Validation they contain a field to be filled. To set the behavior, open the properties and set the group behavior. Note You cannot insert a group and fill other fields with the drag-and-drop method. You can insert the group, select it, and then insert fields to it. Or, you can select the items that should be grouped and select Group Selection from the context menu. Select a single element by clicking it. Use Ctrl+leftclick to select several elements.
Chapter 5 Figure 5-14. Properties Panel Fixed Click Auto Hide on the right top corner of the panel to activate the floating mode that displays an additional label to the right of the panel. When you move the mouse over it the properties panel is shown, but it vanishes after a short while and only the label at the right side remains visible instead of the whole properties panel. Figure 5-15. Properties Panel Floating Click the Auto Hide button in the right corner to toggle the different docking modes.
Set Up Validation • Field Properties • General Properties for the form Ascent Xtrata Pro User's Guide 239
Chapter 5 • Document Viewer Properties • Group Properties • InPlace Editor Properties 240 Ascent Xtrata Pro User's Guide
Set Up Validation • Label Properties • MiniViewer Properties • Table Properties Ascent Xtrata Pro User's Guide 241
Chapter 5 General Dialog Boxes The following dialog boxes are available in the validation design tool. Define Tab Sequence Dialog Box This dialog box is used to define the tab order of the fields on the validation form. Figure 5-16.
Set Up Validation Click “Define field order” from the toolbar to show the Define Tab Sequence dialog box. A green rectangle is added to the form that shows the tab order number of each field. The rectangle’s color changes to blue for an element that is selected within the Define Tab Sequence dialog box. Use the Up and Down buttons to change the position of a field and the order of fields within a group box.
Chapter 5 Validation Sample The following sections describe how to set up a sample project. Step 1: Set up Classification and Extraction Project The following figures show the sample documents and fields, for which this sample project was created. Figure 5-18. Sample Document To process the document, a class was created that contains eight fields. Figure 5-19.
Set Up Validation Step 2: Define Validation The following checks were defined for the fields Figure 5-20. Required Field Validation Checks For the project, you must define validation methods and assign them to extraction fields in order to set up the validation rules that correspond to the checks that are shown in Figure 5-20.
Chapter 5 Figure 5-21. Required Field Validation Checks • ActualDateCheck: Checks if the date has a valid date format and is not more than 30 days in the past. • BirthDateCheck: Checks if the field is empty and if not, checks if the age is over 18 years. • CustomerIDCheck1: This Standard validation method is used to check if the extraction result is an 8-digit number and applies a check-digit algorithm.
Set Up Validation Figure 5-22. Multi Field Script Validation You can test the validation method from within this dialog box. Insert test values for the fields in the Required Fields panel and click Validate in the Testing panel. Recall that extraction fields normally have formatters. During testing, the method is not related to the extraction fields which makes it necessary to select a formatter from the drop-down list of available formatters.
Chapter 5 Figure 5-23. Validation Concept for the Project Figure 5-24. Validation Rules Panel Showing the Defined Validation Rules Here is the sample of a single field validation rule.
Set Up Validation Figure 5-25. Single Field Validation Rule Here is the sample of a multi-field validation rule. Figure 5-26.
Chapter 5 Validation Form After the validation methods and rules are created and assigned to the extraction fields, a validation form must be created. The following is a sample validation form. Figure 5-27.
Chapter 6 Project Builder User Interface Introduction This chapter provides details on the Ascent Xtrata Pro Project Builder user interface. User Interface Elements The Ascent Xtrata Pro Project Builder has a main menu and a toolbar for quick access to project configuration tasks. Below these, the interface is divided into two main sections. The left section has a panel showing the project structure, and the section to the right has two panels arranged with one above the other.
Chapter 6 Figure 6-1. Project Builder Main Screen The area below the main toolbar is mainly divided into three sections: • On the left, the Project panel shows the project class tree, showing all defined and derived classes. Additional details about the classes are shown in columns to the right of the class names. Depending on your screen size, you may have to scroll the contents of the panel to see these additional columns. • The right section is divided into two main areas.
Project Builder User Interface 3 Extraction Results – used to show the extraction result for a processed document 3 Classification Design – used to set up the Instruction Classifier 3 Extraction Design – used to manage fields and locators 3 Validation Design – used to set up validation rules.
Chapter 6 • Save Project as – displays the Save Project As dialog box so the project can be saved with a different project name or in a different location. • Recent Projects – gives quick access to the most recently used projects. • Validate Project – validates the project. • License Utility – validates the license to check whether the features used within the project are licensed. The validation process may check for any one of the following types of licenses; 3 a license file (*.
Project Builder User Interface 3 No. Learn Docs - displays the number of documents added to the training set of the class. 3 No. Defined Fields - displays “TRUE” if an inherited locator method was changed for a field or a new locator method was assigned to an inherited field. If only inherited settings are used “FALSE” is displayed. 3 Field Count - displays the total number of fields of the class, inherited fields are included. 3 Class ID - displays the internal unique class ID for a class.
Chapter 6 • Process Selected Document – classifies and extracts the currently selected document. • Validate Document – shows the extraction results for the currently selected document using the validation form • Test Document Separation - performs document separation and shows the results in the Document Separation Results dialog box. The Tools menu: • Calculate Result Matrix for – calculates the result matrix for a training set or reference set and displays the results.
Project Builder User Interface Toolbars The toolbars provide shortcuts to many menu items and quick access to all important features. The main toolbar is available for all working modes (Classification Results, Classification Design, etc.). In addition to the main toolbar, there is a Mode toolbar that gives quick access to the various working modes. Additionally the working mode and the file list panels each have their own sets of toolbars. Table 6-1.
Chapter 6 Process Selected Document – classifies and extracts the currently selected document. Validate Document – shows the extraction results for the currently selected document using the validation form. Test Document Separation – performs document separation and shows the results in the Document Separation Results dialog box. Extraction Benchmark – displays the Extraction Benchmark dialog box and performs an analysis for the currently selected class. You can save the results of the benchmark in a file.
Project Builder User Interface panel on the upper right side of the interface. Table 6-3. File Lists Toolbar Toolbar Buttons Description Test Folder – Shows the Test Folder panel and displays the list of available documents for the currently selected file type. To change the file type, click Open Test Folder and select a different file type (text, image, or XDoc).
Chapter 6 Project Panel This panel is consists of a table that shows the class hierarchy and details about the classes. For example, it might indicate whether a script has been defined and the number of training documents that are available for the class. The class hierarchy is set up by adding classes and subclasses under the Project node and specifying the desired properties for each class.
Project Builder User Interface Class icon shown when a class is just added to the class hierarchy. Class icon shown when a class is not a valid classification result. Default class icon. Class icon shown when this class redirects all documents to another class. Class icon shown when subtree classification is enabled for the class. Context Menu The following context menu items are provided for the elements of the class hierarchy: • Add Class – adds a new class to the class tree.
Chapter 6 Project Panel for Invoice Projects This panel is consists of a navigation pane that allows quick access to the different working modes. Project The panel is divided into three sections: Tasks Shows the tasks that can be performed for the current working mode. You can create a new invoice project; open an invoice or standard project, open the script window; manage knowledge bases, and open the Projects Settings dialog box. Recent Projects Shows a list of recent projects.
Project Builder User Interface Figure 6-4. Project Panel – Project Working Mode Base Class The panel is divided into three sections. Tasks Shows the tasks that can be performed for the current working mode. You can create a new template, open the script window, open the Class Properties dialog box, define a validation form, and delete a validation form. Details Shows the number of templates, knowledge bases and sample documents available for the current invoice project.
Chapter 6 Navigation Pane Buttons at the bottom for Project, Base Class, and Templates are used to change working modes. Figure 6-5. Project Panel – Base Class Working Mode Templates Shows the templates that are defined for this invoice project. You can right-click a template to show its context menu. Buttons at the bottom for Project, Base Class, and Templates are used to change working modes.
Project Builder User Interface Figure 6-6. Project Panel – Templates Working Mode Context Menu The following items are provided: • Add template – adds a new template to the project. A default name is provided that you can change. Note that you have to train documents for classification and extraction. • Rename Template – changes the name of the template. • Delete Template – removes the template from the list and the trained documents from the project. • Show Script – opens the script window.
Chapter 6 Note The full context menu does not appear until at least one template has been added. Classification Design Panel Use this panel to set up the Instruction Classifier. It consists of a toolbar and a table that lists all the instructions for the currently selected class. Figure 6-7. Classification Design Panel Instruction List This table lists the defined instructions and relevance settings for the currently selected class. Instructions may consist of one or more phrases, such as Plus + Bus.
Project Builder User Interface Classification Design Toolbar The following toolbar buttons are available for the Classification Design panel: Table 6-5. Classification Panel Toolbar Toolbar Buttons Description Add Instruction - displays the Instruction Properties dialog box, allowing you to enter phrases for a new instruction. For details see Instruction Properties. Delete Instruction - removes the currently selected instruction from the list of instructions.
Chapter 6 Detailed classification results are shown in the Classification Results panel. If the document is open in the Document Viewer, the recognized class is also shown just below the Document Viewer toolbar. Figure 6-1. Classification Results Panel Classification Result Icons The following icons provide additional information to the user, for example by showing the hierarchical rule that has been applied. Table 6-6.
Project Builder User Interface Competing Children. Classification icon used when the classification result is the result of the hierarchical “Propagated NOT flag” rule. For further details see Classification – Hierarchical Evaluation Rules – Propagated NOT Flag. Classification icon used when the classification result is the result of the hierarchical “Child Wins Over Parent” rule. For further details see Classification – Hierarchical Evaluation Rules – Single Child Wins Over Parent.
Chapter 6 Table 6-7. Extraction Design Field Toolbar Toolbar Buttons Description Add Field – inserts a new extraction field in the table of fields. A default name, that you can easily change, is provided. Delete Field – removes the currently selected extraction field. Field Properties – displays the Field Properties dialog box to set a fixed value, select formatting methods define special display option, locator result and validation thresholds and rereading options.
Project Builder User Interface Export Locator – displays the Export locator dialog box so you can save the locator to an external file (*.loc). Import New Locator - displays the Import new locator dialog box so you can import a previously exported locator. Import Locator Method - displays the Import locator method dialog box so you can replace the currently selected locator method with a new locator method. Move Locator Up – moves the currently selected locator one position up in the table.
Chapter 6 To assign another locator method, click the Locator Method column and select the method from the list. The Comments column displays system messages. For example, if you select the Database Locator method, “No database selected” will appear in the Comments column until you set up the database. Extraction Result Panel Use this panel to show the extraction results for the current test document. The class for which the document was extracted is shown in the toolbar. Figure 6-8.
Project Builder User Interface Validation Rules Panel The Validation Rules panel includes tools for assigning validation methods to one or more fields. One field may have several validation rules assigned. To define single field validation rules, select a field from the list of fields and click Add Single Field Rule from the toolbar. Depending on the field type, a single field validation rule or a single table field validation rule is created.
Chapter 6 fields. To change the default name, edit the Validation Rule column on the right. Delete Validation Rule – removes the validation rule from the list of applied validation rules. Properties – displays the validation rule properties dialog box where you can set the properties for the validation rules. Move Up – moves the currently selected validation rule one position up in the table. The order defines the sequence in which the rules are applied.
Project Builder User Interface Figure 6-9. Result Matrix Panel Toolbar The Result Matrix toolbar provides access to several major features of the Results Matrix viewer. Table 6-10. Result Matrix Toolbar Toolbar Buttons Description View Selection - used to select a view. Calculate Reference Set - calculates the Result Matrix for a Reference Set with or without hierarchical rules.
Chapter 6 Calculate Training Set - calculates the Result Matrix for the current Training Set. Statistics - displays the Class Based Precision and Recall dialog box that shows the statistics for the currently calculated documents. Graph There are several ways to navigate the result matrix, scroll bars are not provided. • If you left click the graph and drag, you can rotate the graph. • If you right click and drag, you can move the graph within the viewer pane. • Use the mouse wheel to zoom in and out.
Project Builder User Interface Figure 6-4. Test Folder Test Folder Toolbar The Test Folder toolbar provides access to several major features of the panel. Table 6-11. Test Folder Toolbar Toolbar Buttons Description Open Test Folder – displays the Open Test Folder dialog box where you can select the location for the test documents. You can also select the type of document (text, tif or XDoc) to load. Note Remember to display the XDocs when you want to train documents for extraction.
Chapter 6 3 Use for Extraction - displays the Edit Document dialog box where you train the group locator fields. If the fields are disabled, you need to the select a class where a field group locator is defined. Train for Extraction – displays the Edit Document dialog box where you train fields using the documents in the Test Folder list. The documents are added to the default training set of the currently selected class. Show Document – opens the document within the document viewer.
Project Builder User Interface Figure 6-5. Training Set Classification Training Set (Classification) Toolbar The Training Set (Classification) toolbar provides access to several major features of the Training Set (Classification) panel. Table 6-12. Training Set Classification Toolbar Toolbar Buttons Description Views – use this list to select the classifier view you want to use.
Chapter 6 Training Set (Extraction) Panel When Training Set (Extraction) is chosen from the File Lists toolbar, all the files in the training set for the currently selected class are shown in the file list Figure 6-6. Training Set Extraction Panel for Class Invoice Training Set (Extraction) Toolbar The Training Set (Extraction) toolbar provides access to several major features of the Training Set (Extraction) panel. Table 6-13.
Project Builder User Interface Exclude Folder From Training – use to specify that the currently selected folder will not be used when the project is trained. If a folder has been excluded, click this button again to remove the exclusion so that the folder will be used when training. Rename Folder – displays New Folder dialog box where you can change the name of the currently selected folder.
Chapter 6 Figure 6-7. Selection Panel for Image Clustering If you select a bar from the result matrix’s bar and select Display in Selection List, then the documents are listed as shown below. This is especially useful when you want to optimize the training set. Figure 6-8. Selection Panel for Result Matrix Selection Toolbar The Selection toolbar provides access to several major features of the Selection panel. Table 6-14.
Project Builder User Interface Toolbar Buttons Description Add to Training Set of Selected Class 3 Use for Layout Classification – adds the document to the layout classification training set of the currently selected class. 3 Use for Content Classification - adds the document to the content classification training set of the currently selected class. 3 Use for Both Classification Types - adds the document to the layout and the content classification training set of the currently selected class.
Chapter 6 Figure 6-9. New Samples Panel New Samples Toolbar The New Samples toolbar provides access to several major features of the New Samples panel. Table 6-15. New Samples Toolbar Toolbar Buttons Description Open New Samples Directory – displays the Browse For Folder dialog box where you can select the location of the sample documents. Close New Samples– closes the sample set. Refresh – updates the current list.
Project Builder User Interface Filter New Samples – displays the Training Documents filter by dialog box where you can select different Filter Options for the sample set. Add to Training Set of Selected Class 3 Use for Layout Classification - adds the document to the layout classification training set of the currently selected class. 3 Use for Content Classification - adds the document to the content classification training set of the currently selected class.
Chapter 6 Delete Selected Document – removes the currently selected document from the sample set. Delete all Documents – removes all documents from the sample set. New Samples Sample Documents List As soon as a samples folder is opened, a list of files is displayed that may be used to optimize the classification and/or extraction training set, or improve table extraction by adding new table definitions from Validation module.
Project Builder User Interface Figure 6-10. Document Viewer Context Menu The following context menu items are available for documents displayed with the Show Text option: • Classify Selection – classifies the selection and shows the confidence values for the classification result in the Classification Result dialog box. • Classify Lines – classifies all lines of the shown document and shows the classification results within the viewer.
Chapter 6 Document Viewer Toolbar The Document Viewer toolbar provides access to several major features of the viewer. Toolbar Buttons Description Show Text/Show Image – switches between text display and image display. Image display is only available for image documents. Text display is available for text documents or for image documents if OCR has been performed on them. Display Previous/Display Next – navigates to the previous or next page or document.
Project Builder User Interface Classify Current Document – classifies the current document. Extract Current Document – performs extraction on the current document. Process Current Document – classifies and performs extraction on the current document. Help – displays online Help.
Chapter 6 Figure 6-11. Add Classification View Dialog Box View name Enter the name of the view. Classifier type Select the classifier type for which the view is created. Existing training set Enable this check box and select the location of the desired training set. Management of training set By default, the files of the currently selected training set are copied to your project directory. If you want to reference the files and do not want to copy them, select “Only keep reference to files (no copy).
Project Builder User Interface Buttons OK Click this button to save your settings. Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box. Advanced Zone Locator Zone Settings Dialog Box The Advanced Zone Locator supports individual settings for each text, OMR or OMR group zone that has been added to the reference page.
Chapter 6 Figure 6-12. Advanced Zone Locator - Zone Settings Dialog Box Viewer The content of the currently selected zone is displayed in this viewer area. By drawing a rectangle with the mouse on this viewer area you can create a blank out region for this zone. Its coordinates are automatically added to the Blank Out Regions list. General • Left, Top, Width or Height - You change the position or size of the zone using the up and down buttons or by directly entering values.
Project Builder User Interface • Rotation – By default 0° is selected. If you want to rotate the image in the viewer, select the appropriate amount. The result is shown within the viewer area. • Name - You can provide a new name for the zone. • Page - You can specify the page to which the zone should be applied by selecting a value from the list. Background Removal Background removal can only be performed when you have added four sample documents in addition to the reference document.
Chapter 6 Test (Background Removal) Click this button to test the ability of the zone locator to remove background information based on the current sample information. Test (Dynamic Zone Adjustment) Click this button to test the current dynamic zone adjustment settings. The results will appear in the main viewer at the top of the dialog box. Properties (OCR) Click this button to display the OCR Profile Settings dialog box and adjust the settings of the currently selected profile.
Project Builder User Interface Figure 6-13. Application Language Dialog Box Please select the desired application language Select the desired language from the list. When you start Project Builder for the first time, the language is determined by the operating system. Buttons OK Click this button to save your settings. Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box.
Chapter 6 Figure 6-14. Class Based Precision and Recall Dialog Box Menus The File menu: • Save statistics – saves the statistics to a text file. The information is delimited with tabs. • Save classification details – saves the classification details to a text file. The information is delimited with tabs. • Close – closes the dialog box. The Help menu: • Contents – opens the online Help.
Project Builder User Interface Figure 6-15. Classification Results Dialog Box Buttons Close Click this button to close the dialog box. Help Click this button to view the online Help topic for this dialog box. Class Properties Dialog Box Use this dialog box to set the options for the currently selected class. For further information see Classification - Definition of Classes and Class Tree – Class Properties.
Chapter 6 Figure 6-16. Class Properties Dialog Box General Valid classification result This option is selected by default. It means that this class can be used for classification purposes. If not selected, a document will never be assigned to this class, even if it is a perfect match. Visible in validation: This option is selected by default.
Project Builder User Interface Otherwise, the class name will be excluded from the list and the operator will not be able to assign it as the classification result. Note If a document is classified to a “non-visible” class, then this class will appear in the list of classes for this document despite its non-visible status. Extract this class with the external server This option is not selected by default.
Chapter 6 documents. For further details about document separation class settings see Classification - Class Properties – Document Separation . If document separation is not used, each loose page in the batch becomes a single page document. You generally activate document separation at the project level. For further details see Project Builder User Interface - Project Settings Dialog Box – General Tab .
Project Builder User Interface Select to always add the middle page to the current document. For a last page, it works the same way, except that the document is closed after the page was added and a new document is started for the next processed page of the multi-page document.
Chapter 6 Figure 6-17. Create New Class and Table Locator Dialog Box New Class Name Enter the name for the new class. Select Parent Class Select a class to which the new class will be added as a sub class, or select “Insert as base class” to insert the new class as an additional base class. Locators By default Inherited is selected, and you only have to select the “old” table locator that will be updated to the new settings made during validation.
Project Builder User Interface locator, select New and enter a name. This new locator will adopt the settings from validation. Select Table Model Select a table model from the list of available table models. The selected table model is set for the Table Locator properties when you click OK. Current Document The Viewer displays the document that was returned from validation to create a new table locator. Buttons The following buttons are provided. OK Click this button to save your settings.
Chapter 6 Figure 6-18. Dictionary Options Dialog Box Referenced import file Enter the file name and path to the text or csv file to be used as a dictionary. Or, click the Browse button and browse to the desired file. Import Options Ignore case Check this option to ignore case when extracted words are compared with dictionary entries.
Project Builder User Interface Note The delimiters must correspond to the delimiters that are defined for OCR, otherwise OCR may not separate the words correctly and then the separately searched words will not be found or, if found, the confidence is lower. Characters to delete This is a filter that deletes unwanted characters from the input record. The imported file may contain quotation marks to identify a string. These can be eliminated by using this option.
Chapter 6 OK Click this button to save your settings. Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box. Field Formatter Properties Dialog Boxes Field formatters are used to unify the formats of extracted fields by converting them to predefined formats. Each formatter type has its own Properties dialog box.
Project Builder User Interface Amount Formatting Dialog Box Use the Amount Formatter dialog box to configure amount formats (for example, to specify the decimal symbol or to separate the currency from the amount). Figure 6-19. Amount Formatting Dialog Box Amount Analysis Require decimal point Select this option if the input value must have a decimal point. This option is selected by default.
Chapter 6 No. of decimal places Select a number from the list that defines how many decimal places are used. Possible values range from 0 to 4. Currency Use the Add or Delete buttons to insert/delete currency phrases. A currency phrase may be found on the document next to an amount. If there is no space between the amount and the currency phrase, then the OCR engine will put them together.
Project Builder User Interface Figure 6-20. Date Formatting Dialog Box Date Detection Select settings that will expedite the detection of dates found on the document. If a Month Dictionary is available for the project, you can select the desired dictionary from the list. Month dictionaries are used to convert dates that include the names of months to a numerical date format.
Chapter 6 Select date formats that are found on the document. For example, select an American date format (MM/DD/YYYY) or a European date format (DD/MM/YYYY). You can use the arrow buttons to add or remove formats from the list of selected formats. You can also move formats up or down in the list, which controls the sequence in which they are evaluated. For the best efficiency, place the most likely formats at the top of the list. Testing To test the settings, enter some sample input text and click Format.
Project Builder User Interface Figure 6-21. Percentage Formatting Dialog Box Decimal symbol Set the decimal symbol to a period (.), comma (,) or . If set to , the decimal symbol is determined by the operating system of the local computer. No. of symbols after decimal Select a number from the list that defines how many decimal places are used. Possible values range from 0 to 4. Testing To test the settings, enter some sample input text and click Format.
Chapter 6 Help Click this button to view the online Help topic for this dialog box. Script Formatting Properties Dialog Box The Script Formatter is used to perform special formatting for an input field that is based on script programming. For further information about a sample script for a script formatter, see Administration – Field Formatting. Figure 6-22. Script Formatting Dialog Box Options A field can have one of three data types; Text, Double/Amount, or Date.
Project Builder User Interface Select one of the field data types from the drop list to generate sample script code that corresponds to the selected field data type. You may copy the sample code to the Script Editor to add it to your script. Testing To test the implemented script, enter some sample text and click Format. The sample text is formatted as defined in the script and displayed in the output field. Buttons OK Click this button to save your settings.
Chapter 6 Figure 6-23. Field Properties Dialog Box Fixed Value To define a fixed result for a field, enable this option and specify the desired value. If a fixed value is defined, and a locator has already been mapped to the field, then the mapping is changed to ” This means that the field can not be mapped to any locator, and therefore the Locator Result Threshold and the confidence and distance values for validation can not be set, and are disabled.
Project Builder User Interface Figure 6-24. Field Mapping for Field with Fixed Value If you set a field’s locator to , then the Fixed Value option is automatically selected. To insert the value itself, you have to open the Field Properties dialog box and enter the value there. If a fixed value is defined for the properties of a field, but you subsequently select a locator, then the Fixed Value option is automatically turned off.
Chapter 6 Two thresholds can be set for the validation confidence: Minimum confidence to make field valid A locator’s best alternative is only assigned to a field, if its confidence is higher than this value. The default value is 80%. Note that this value cannot be set lower than the Locator Result Threshold. Minimum distance to second best alternative A locator’s best alternative is considered to be valid only, if the distance to the second best alternative is greater than or equal than this value.
Project Builder User Interface OK Click this button to save your settings. Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box. Table Field Properties Dialog Box Use the Table Field Properties dialog to select the table model for a table field, and to make other settings for table fields. Figure 6-25.
Chapter 6 Choose a table model for this table field from the list. Column Settings (Formatting and Validation) After the table model is selected, the corresponding table columns are shown. Model Column Displays the name of the column as specified in the table model. Column Formatting For each column, you can choose a field formatter from the list of available field formatters. Confirm Select to force the user to confirm the value for this column for each row by pressing “Enter” in the Validation module.
Project Builder User Interface Buttons Close Click this button to save your settings and close the dialog box. Help Click this button to view the online Help topic for this dialog box. Filter Options Dialog Box Use this dialog box to control the display of files in the New Samples file list.
Chapter 6 Figure 6-26. Filter Options Dialog Box Select any of the filters on the left to use that filter. This enables the options on the right. Training Information You can use the items in this section to set filters based on various training related criteria.
Project Builder User Interface Trained Shows files that have not yet been used for training. Once the file has been used for training, it will not appear in the filtered list. Train classification Shows files that are intended to be used to improve classification performance. Train extraction Shows files that are intended to be used to improve extraction performance. Train comment Shows files that have an attached comment.
Chapter 6 OK Click this button to save your settings. Cancel Click this button to close the dialog box and discard any changes you made. Apply Click this button to save your settings without closing the dialog box. Help Click this button to view the online Help topic for this dialog box. Fuzzy Database Options Dialog Box Use this dialog box to select an import file for the database and to make other settings. A folder button is available for browsing to a file for the database.
Project Builder User Interface Figure 6-27. Fuzzy Database Options Dialog Box Referenced import file (text or csv file) This field shows the path of the text or csv file that contains the database records. The path can be entered manually or selected by clicking the folder icon to open the Open dialog box. The import process starts automatically when the dialog box is closed. A message box is displayed that counts the number of imported database lines.
Chapter 6 Figure 6-28. Imported Database Count Message Import Options The import options frame contains various filters and format settings. Ignore case Check this option to convert all search and lookup strings to lower case (the case is ignored). First line contains caption Select this option if the first record of the input file contains the column headers. These names will be used as field names in the database locator.
Project Builder User Interface Characters to ignore This is a filter that strips unwanted characters from the input record. When you want to use a field delimiter that may also be a character in the input t, for example a comma (,), then you have to use quotation marks (“”) to identify the input strings. However you probably do not want to retain those quotation marks as part of the final results. If you define the quotes as characters to ignore, they will be removed.
Chapter 6 Field present on document / Not present on document Select “Field present on document” from the context menu if the field in the database is to be used for matching and select “Not present on document” if not. Often databases contain additional fields like internal customer IDs or contact names that are not present on the document, but you would like to treat as an index field value.
Project Builder User Interface OK Click this button to save your settings. Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box. Global Columns Settings Dialog Box Use this dialog box to enter local names for global table columns in a variety of languages. Figure 6-36. Table Global Columns Settings Dialog Box List of local names A list of local names for different languages is displayed.
Chapter 6 Buttons The following buttons are provided. Add Name Click this button to add a new name for the current column. Delete Name Click this button to delete the currently selected name. Help Click this button to view the online Help topic for this dialog box. Close Click this button to close this dialog box and save your changes. Instruction Properties Dialog Box This dialog box is used to set up the individual instructions for the instruction classifier. Figure 6-29.
Project Builder User Interface Relevance Set the relevance value for the instruction. If an instruction consists of several phrases, all parts of the instruction must be found on the document. The relevance values for all the instructions that are found in the document are added up, and yield the classification result. Phrases This list displays all the phrases the instruction contains. To modify a phrase double click the phrase in the list of phrases.
Chapter 6 Help Click this button to view the online Help topic for this dialog box. New Field Formatter Dialog Box Use this dialog box to add the name of a new field formatter and to define its formatter type. Figure 6-30. New Field Formatter Dialog Box Name Enter a name for the field formatter. It is recommended that you use a name that includes information about the field formatter type.
Project Builder User Interface Buttons OK Click this button to save your settings, and display the Properties dialog box. For more information see Field Formatter Properties on page 306. Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box. New Validation Method Dialog Box Use this dialog box to add the name of a new validation method and to define its type. Figure 6-31.
Chapter 6 Enter a name for the validation method. It is recommended that you use a name that includes information about the validation methods type. Type Select a type for the validation method. For further information see Validation Methods Buttons OK Click this button to save your settings and display the Properties dialog box. For more information see New Validation Method. Cancel Click this button to close the dialog box and discard any changes you made.
Project Builder User Interface Add Enter a new substitution pair into the two edit boxes. Enter the character that should be replaced in the left edit box. Enter the character that should be used as the replacement in the right edit box. Click Add to include this pair in the list. Delete Click this button to delete the corresponding substitution pair. OK Click this button to save your settings and display the Project Settings – Profiles tab.
Chapter 6 OK Click this button to save your settings. Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box. Project Properties Dialog Box Use this dialog box to provide a description for the project and to manage project protection settings. Figure 6-33. Project Properties Project description If desired, you can add a description for the current project.
Project Builder User Interface Use read protection when you want to limit the viewing of a project file to password holders. When viewing is protected, and you also want to prevent changes to the project, then set a write protection password as well. Buttons OK Click this button to save your settings and display the Properties dialog box. Help Click this button to view the online Help topic for this dialog box.
Chapter 6 Project Settings Dialog Box – General Tab The General tab is used to make the following settings. Figure 6-34. Project Settings – General Tab Automatic Rotation Select these options to enable automatic image rotation during OCR processing and/or layout classification. Validation Select this option to show the class list in the Validation module.
Project Builder User Interface Color images cannot be processed. Select either “converted to b/w automatically” or “ignored” to define how to color images should be handled. Document Separation Select this option to activate document separation for the project. By default document separation is not selected. Pages are two sided (back will be ignored) Select this if you have two-sided pages. The back side will be ignored.
Chapter 6 Figure 6-35. Project Settings – Classification Tab Classification Settings Default classification result This option specifies a default result, which will be used if no other classification result could be determined. Select a class from the list. Classification evaluation Automatic evaluation Classification occurs automatically. The specified values for confidence and distance are used to evaluate the result.
Project Builder User Interface Content Classifier Classify only first page The classification loop will process only the first page of a document for the specified classifier type. If the page can not be classified the document will be unclassified. Classify each page The classification loop will process each page of a multi-page document for the specified classifier type until a page can be classified or the complete document is processed and stays unclassified.
Chapter 6 Min. Confidence The minimum confidence is the smallest value required for automatic evaluation to assign a classification result. Min. Distance This value specifies the minimum required distance between the best and the second best classification result. Project Settings Dialog Box – Views Tab The Views tab is used to manage instances of a classification engine (called views) in a project. Figure 6-36.
Project Builder User Interface The available views are listed by name. Double–clicking an entry displays a properties dialog box, if one is available. Some classification types do not have an associated properties dialog box. Buttons Add Click this button to display the Add Classification View dialog box. Delete Click this button to delete the currently selected view. Rename Click this button to change the name of the currently selected view.
Chapter 6 Figure 6-37. Project Settings – Profiles Tab List of Profiles The list of profiles shows all the profiles that have been defined for the current project. The first column has the name of the profile, the second column the type, the third column the engine that is used with the profile. The type may be “Page” for full page OCR, “Zone OCR” for zonal OCR or “Zone OMR” for OMR. When a new project is created, one profile of each type is generated automatically.
Project Builder User Interface Page Profile Click this button to insert a new page profile. The Profile Settings dialog box is displayed. The exact contents of the dialog box will depend on the type of profile. A new page profile is added with a default name that you can change. Zone Profile Click this button to insert a new zone profile. The Profile Settings dialog box is displayed. The exact contents of the dialog box will depend on the type of profile.
Chapter 6 OCR Substitution Click this button to open the OCR Substitution dialog box. These settings are independent of the currently selected profile. For further details see OCR Substitutions dialog box. Project Settings Dialog Box – Databases Tab Use the Databases tab to manage the databases in a project. Figure 6-38. Project Settings – Databases tab List of Databases The databases are listed by their name along with the file from which the database was imported.
Project Builder User Interface Add Click this button to insert a new database. When prompted, enter a name for the database and click OK to open the Fuzzy Database Options dialog box. Remove Click this button to delete the currently selected database from the project. Rename Click this button to change the name of the currently selected database. Properties Click this button to open the Fuzzy Database Options dialog box. Refer to Fuzzy Database Options for more information.
Chapter 6 Figure 6-39. Project Settings – Dictionaries Tab List of Dictionaries The dictionaries are listed by their name and the file from which the dictionary was imported. During Ascent Xtrata Pro installation, a set of sample dictionaries is copied to the Project Builder application directory (..\Dict). Buttons Add Click this button to insert a new dictionary. When prompted, enter a name for the dictionary and click OK to open the Dictionary Options dialog box.
Project Builder User Interface Rename Click this button to change the name of the currently selected dictionary. Properties Click this button to open the Dictionary Options dialog box. Import Click this button to update the currently selected dictionary with data from the source import file. Project Settings Dialog Box – Tables Tab Use the Tables tab to manage the table models in a project. Figure 6-40.
Chapter 6 Column Pool The global columns are listed by name along with the languages for which they are defined. To add or delete a column, or to edit the properties of a selected column, use the following buttons. Add New Column Click this button to insert a new global column. The new column is named “GlobalColumn,” or “GlobalColumnn,” where n is a sequential unique number.
Project Builder User Interface Add New Language Package Click this button to insert a new language package. When prompted, enter a name and click OK to open theTraining for Language Package dialog box for the selected language package. Delete Selected Language Package Click this button to delete the currently selected language pack from the project.
Chapter 6 List of Formatters The formatters are listed by name. Double click a formatter to open the properties dialog box for the type of formatter. The following types of field formatters are available: • Amount Formatter • Date Formatter • Percentage Formatter • Script Formatter Default Formatters Select a formatter from the list of available formatters to define the default formatter for date or amount formats. Buttons The following buttons are provided. Add Click this button to insert a new formatter.
Project Builder User Interface Figure 6-42. Project Settings – Validation Tab List of Validation Methods The validation methods are listed by name. Double click a method to open the properties dialog box for the type of method. Buttons Add Click this button to insert a new validation method. Enter a name for the method, select a type, and then click OK to open the properties dialog box for that type of validation method. Delete Click this button to delete the currently selected database from the project.
Chapter 6 Rename Click this button to change the name of the currently selected validation method. Properties Click this button to open the properties dialog box for the currently selected validation method. Refer to Validation Methods Propertieson page 365 for more information Project Settings Dialog Box – Knowledge Base Tab Use the Knowledge Base tab to manage knowledge bases in the project. Figure 6-43.
Project Builder User Interface List of Knowledge Bases The knowledge bases are listed by name and type. The check box on the left is used for activating a knowledge base in the current project. Knowledge Base Buttons Import Click this button to import an existing knowledge base into the current project. Create Click this button to create a new knowledge base. Delete Click this button to delete the currently selected knowledge base from the list of knowledge bases.
Chapter 6 New Password Enter the new password in the password text field and re-enter it in the text field below. The new password will only be accepted if the field entries are identical. Buttons The following buttons are provided for this dialog box. OK Click this button to save the new password and to return to the Project Settings dialog box. Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box.
Project Builder User Interface Figure 6-45. Page OCR Properties Dialog Box Language Select the languages that the OCR engine should recognize. General Settings Correct skew Select this option if you want the OCR engine to automatically to correct the image if it has been skewed. The number of degrees for which this feature works depends on the OCR engine that is selected. Remove noise Select this option to remove noise from the image.
Chapter 6 Select the types of printed characters expected in the document. If you select Handprint, the Writing style list is enabled so you can select a language from the list. This language style selection applies only to handwritten characters. All the other character types make use of the language selection in the Language list. Buttons OK Click this button to save your settings and display the Properties dialog box. Cancel Click this button to close the dialog box and discard any changes you made.
Project Builder User Interface Figure 6-46. Page OCR Properties Dialog Box Zone Recognizers Select the recognition engine from the list of available engines. The Finereader 8.0 recognition engine is selected by default, and is installed during setup. Other engines are available, but must be licensed separately. Languages Select the type of content (numeric or alphanumeric) and the language(s) for the text in the document.
Chapter 6 Select the types of text that may be found in the document. If you select Handprint, the Writing style list is enabled so you can select a style from the list. This language style selection applies only to handwritten characters. All the other character types make use of the language selection in the Languages list. Recognizer Options Select the appropriate options to prohibit superscript, Prohibit subscript, Prohibit italic, Remove noise or use the Fast mode.
Project Builder User Interface Layout Options Select the appropriate layout options. 3 Read page as single column 3 Single line 3 One word per line – 3 One word per block – Buttons OK Click this button to save your settings and display the Properties dialog box. Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box.
Chapter 6 Figure 6-47. Zone OMR Properties Dialog Box Zone Recognizers Select the OMR recognition engine from the list of available engines. Result Texts Use this panel to define the values for a marked, unmarked, and unsure OMR recognition result. Mode Choose the mode and make additional settings in the corresponding panel. Automatic Mode In automatic mode the degree of blackness for a marked, unmarked or unsure OMR zone is determined automatically.
Project Builder User Interface Manual Mode In the manual mode you can define the amount of blackness used to determine the state of an OMR zone. A range can be assigned by setting minimum and maximum blackness values by using the slider or directly entering a value. Buttons OK Click this button to save your settings and display the Properties dialog box. Cancel Click this button to close the dialog box and discard any changes you made.
Chapter 6 Figure 6-48. Script Code Dialog Box The editor contains a menu bar and a toolbar. The Object list right beneath the toolbar contains the objects for the current module and the “Proc” list shows the procedures for the currently selected object. Additionally, the editor has different tabs that are vertically arranged down the left side of the dialog box. Each tab consists of an edit area used to insert script code and a band on the left to place breakpoints.
Project Builder User Interface Before you start entering code you have to select the right sheet. To implement a field formatting procedures you have to select the sheet for the project level, that means the first sheet (tab 1). The dialog box caption is set to Project – Script Code. To implement a general script locator, field extraction or validation methods, select the field extraction scripts. The dialog box caption is set to Field Extraction – Script Code.
Chapter 6 Figure 6-49. Table Model Properties Dialog Available Columns This panel shows a list of all available columns by name. Table model This table shows a list of all columns in the currently selected table model. By default a table column is visible and the default value is set to 0.00. To make the column invisible, clear the Visible attribute for a column. You can click in the Default value field, to change the default value for a given column.
Project Builder User Interface Un Assign Click this button to remove the currently selected column from the table model. Swap Click this button to swap the currently selected column name with the currently selected column in the table model. Up / Down arrows Use these buttons to change the position of the currently selected table column. Close Click this button to save your settings and close the dialog box. Help Click this button to view the online Help topic for this dialog box.
Chapter 6 Use this validation method to provide predefined validation methods for invoice fields, for example to check total as the sum of all net amounts and tax amounts. Standard Validation Properties Dialog Box Use this dialog box to define validation methods for single fields. Figure 6-50. Standard Validation Dialog Box Options Set the various validation requirements. Check for minimum length Select this option to specify the minimum number of characters allowed for the result.
Project Builder User Interface Check for maximum length Select this option to specify the maximum number of characters allowed for the result. Restrict allowed character set Select this option to specify only those characters allowed in the result. Define not allowed characters Select this option to specify only those characters not allowed in the result. Allow empty field Select this option to allow an empty field to be valid.
Chapter 6 Date Validation Properties Dialog Box Use this dialog box to define validation methods for date fields. Figure 6-51. Date Validation Dialog Date Formatter Choose a date formatter from the list. It is used to reformat the date before the validation check is performed and to provide the field’s DateValue property. Warning Select a date formatter for the validation methods to be sure that the DateVaue is available, otherwise this may cause a script error.
Project Builder User Interface Testing Enter a sample date and click Validate to check the validation settings. A message below the field shows the validation result. Buttons OK Click this button to save your settings. Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box.
Chapter 6 Regular Expression Validation Properties Dialog Box Use this dialog box to define validation methods that make use of regular expressions. Figure 6-52. Regular Expression Validation Properties Dialog Box Formats This area is used for managing the regular expressions that make up the validation method.
Project Builder User Interface Format You can define one or more formats, or choose one from a list of predefined formats. To select a predefined format, click the right arrow button that is located next to the format field. The predefined formats are grouped into the following categories: • Numbers • Select from a variety of commonly found numeric formats. • Characters Select from a variety of commonly found alphanumeric formats.
Chapter 6 Add Click this button to select a predefined format. Modify Select a format from the list of available formats, change its properties, and click Modify to save the changes. Clear Click this button to remove all the settings from the Formats area. Testing Enter some sample text and click Validate to check the validation settings. A message below the field shows the validation result. Buttons OK Click this button to save your settings.
Project Builder User Interface Single Field Script Validation Dialog Box Use this dialog box to define single field script validation methods. Single field script validation methods are used to create validation scripts for a single field. It may be used for example, to check if a customer name or ID that is manually inserted to the validation form field is a valid customer database entry. Figure 6-53.
Chapter 6 Close Click this button to close the dialog box. Show Script Click this button to open the Script Code dialog box. Help Click this button to view the online Help topic for this dialog box. Multi Field Script Validation Dialog Box Use this dialog box to define multi field script validation methods. A multi field script validation method is used to create validation scripts for several fields. It may be used for example, to check the total amount by summing the net amount and the tax. Figure 6-54.
Project Builder User Interface By default this area contains a multi field script sample that you can modify to meet your own needs. You can modify the script by using the Select All and Copy buttons to place the code in an appropriate location in the script editor. Testing Enter some sample text in the Test Value column and click Validate to check the validation settings. A message shows the validation result. Note For testing, it is necessary to select a field formatter for each field as it is tested.
Chapter 6 Invoice Validation Dialog Box Use this dialog box to define invoice validation methods. An invoice validation method is used to easily calculate and validate invoice fields by using predefined amount and tax calculations. It may be used for example, to check the total amount by summing the net amount and the tax. Figure 6-55.
Project Builder User Interface By default the standard amount formatter requires a decimal point, (for example 2.00 or 2,00) otherwise an error message is displayed. For example, if you use 2 for the Subtotal field, the following error message is displayed. During production, the validation formatting assigned to the extraction field is used. General Tab The General tab has settings for the tax model, tax rates and testing.
Chapter 6 regional boundary). Frequently, the tax rates are not printed on the invoice, and therefore must be considered as unknowns. This is the equation for validating an invoice with sales or use taxes. SubTotal +Postage + Packaging – Discount + TaxAmount1 +…+ TaxAmount4 = Total Internally, the SubTotal field and the NetAmount0 field are treated identically. Note An international invoice typically does not contain taxes. It can be mixed in projects with either of the tax models.
Project Builder User Interface Figure 6-56. Invoice Validation Properties – Advanced Tab Dialog Box General Options The following options are available: • Allow field reconstruction for redundant fields – Select this option to allow field reconstruction. Some fields can be automatically derived if they are not present on the document or not recognized correctly. The reconstruction is based on redundant information in the invoice. The tax rates can even be replaced if they contain invalid values.
Chapter 6 NetAmount3 + NetAmount4 – Postage - Packaging + Discount NetAmount0 No = Total (if no tax amount are present) NetAmount1 No = Subtotal + Postage + Packaging - Discount (Tax block 2-4 are not present) TaxRate 1-4 Yes = TaxAmount / NetAmount • Fill empty fields with 0.00 - Select this option to assign 0.00 to empty fields. • Consider Total-only invoices as valid – Select this option to allow an invoice to be valid if it contains only the total amount.
Project Builder User Interface Tax Validation This area shows the predefined calculations used for the tax validation. View Table for Field Dialog Box Use this dialog box to view the results of a table field that are extracted from the current document. Figure 6-57. Table for Field Dialog Box Buttons Close Click this button to close the dialog box. Help Click this button to view the online Help topic for this dialog box.
Chapter 6 OK Click this button to save your settings. Cancel Click this button to close the dialog box and discard any changes you made.. Apply Click this button to save your settings without closing the dialog box. Adaptive Feature Classifier View Properties Dialog Box Figure 6-58. Adaptive Feature Classifier View - Properties Dialog Box Text Filtering Use digits This option controls whether the classifier should consider digits as features during text filtering.
Project Builder User Interface Min. word length All words that are shorter than this value are ignored during text filtering. Training Max. number of features Limits the maximum number of internally generated features per class. Min. feature length Specifies the minimum number of characters that should be used for a feature. This value cannot be smaller than the “Min. word length.
Chapter 6 Note You must retrain the project before any changes in these settings will affect the Content Classifier. Layout Classifier View Properties Dialog Box Figure 6-59. Layout Classifier View - Properties Dialog Box Optimize Classification for Select Invoice to optimize layout classification for invoices, otherwise use Forms. Invoices If this option is selected, the classifier will analyze only the upper and lower parts of the document. The remainder of the document is not used for classification.
Project Builder User Interface Forms If this option is selected, the classifier uses the entire region of the image. This should be used for forms and other types of documents that have a fixed layout over the entire region of the image. Advanced Click “Advanced,” to display the following options: Image Preparation By default, ”Enable skew tolerance” is selected. When selected, the layout classifier can internally correct for a certain amount of skew in the image.
Chapter 6 confidence. The probability of getting misclassified documents would then be much smaller, resulting in a higher accuracy but more rejects. If you make the value closer to the “max. recall” side, higher confidence values are returned for documents with low contrast. However, this might mean that high confidence values are assigned to other classes with low contrast in the same region of the document, which might lead to a higher error rate. In most cases the default value of 15.0 works best.
Project Builder User Interface Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box. Zone Locator Zone Settings Dialog Box The Zone Locator supports individual settings for each OCR zone that has been added to the locator OCR Zone Settings Dialog Box Use this dialog box to configure OCR (text) zones that have been added to a zone locator.
Chapter 6 Figure 6-61. Zone Settings Dialog Box Viewer The content of the currently selected zone is displayed in this viewer area. By drawing a rectangle with the mouse on this viewer area you can create a blank out region for this zone. Its coordinates are automatically added to the Blank Out Regions list. General • To change the position or size of the zone, change Left, Top, Width or Height using the up and down buttons or by directly entering the values.
Project Builder User Interface • Rotation – By default 0° is selected. If you want to rotate the image in the viewer, select the appropriate amount. The result is shown within the viewer area. • Name - You can provide a new name for the zone. • Page - You can specify the page to which the zone should be applied by selecting a value from the list. • Type – Select Text or OMR to specify the type of content expected within the zone.
Chapter 6 Test (Background Removal) Click this button to test the ability of the zone locator to remove background information based on the current sample information. Test (Dynamic Zone Adjustment) Click this button to test the current dynamic zone adjustment settings. The results will appear in the main viewer at the top of the dialog box. Properties (OCR) Click this button to display the OCR Profile Settings dialog box and adjust the settings of the currently selected profile.
Project Builder User Interface Figure 6-62. OMR Zone Settings Dialog Box Viewer The content of the currently selected zone is displayed in this viewer area. By drawing a rectangle with the mouse on this viewer area you can create a blank out region for this zone. Its coordinates are automatically added to the Blank Out Regions list. Note It is recommended that you use the Advanced Zone Locator if you want to use blank out regions for OMR zones.
Chapter 6 General • To change the position or size of the zone, change Left, Top, Width or Height using the up and down buttons or by directly entering the values. • Rotation – By default 0° is selected. If you want to rotate the image in the viewer, select the appropriate amount. The result is shown within the viewer area. • Name - You can provide a new name for the zone. • Page - You can specify the page to which the zone should be applied by selecting a value from the list.
Project Builder User Interface Close Click this button to save your settings and close the dialog box. Help Click this button to view the online Help topic for this dialog box. Zone Locator Zone Profile Settings Dialog Boxes The Zone Locator provides special settings for zone OCR or OMR profiles. OCR Profile Settings Dialog Box Use this dialog box to set the properties for an OCR zone profile. Note You can only save changes to those profiles that are not set as a default.
Chapter 6 Figure 6-63. Zone OCR Profile Settings Dialog Box Languages Select the type of content (numeric or alphanumeric) and the languages that the OCR engine should recognize. Types Select the types of text that may be found in the document. If you select Handprint, the Writing style list is enabled so you can select a style from the list. This language style selection applies only to handwritten characters.
Project Builder User Interface Case recognition mode Select Auto Case for automatic recognition of the case, Small Case if the text contains only lower case letters, and Capital Case if the text contains both upper and lower case letters. Field marking type Select field the type from the list.
Chapter 6 The following buttons are available for each properties dialog box: Close Click this button to save your settings and close the dialog box. Help Click this button to view the online Help topic for this dialog box. OMR Profile Settings Dialog Box Use this dialog box to set the properties for an OMR zone profile. Figure 6-64.
Project Builder User Interface Enter a value to the Result field to define the result value for an unmarked zone, for example 1 or “Yes.” Buttons The following buttons are available for each properties dialog box: Close Click this button to save your settings and close the dialog box. Help Click this button to view the online Help topic for this dialog box. General Invoice Dialog Boxes Create Knowledge Base Dialog Box Use this dialog box to create knowledge bases from the files placed in training folders.
Chapter 6 Figure 6-2. Create Knowledge Base Selected Training Folders Displays the training folders that can be used as part of the knowledge base. Knowledge Base Types Select the types of knowledge bases to be created. There are three knowledge base types, each of which corresponds to the three group locators. During training, the same sample document can be used to train the fields for different group locators, such as the fields of an invoice group locator and the fields for an amount group locator.
Project Builder User Interface Select this option if you want to protect your knowledge base with a password. This will prevent others from making changes to without your permission. Knowledge Base Information Name Name of the knowledge base. Description Enter a description such as the type of sample documents used to create the knowledge base. This enables you to later identify the type of documents that were used. Owner Enter a name to identify the owner of the knowledge base.
Chapter 6 Figure 6-65. Select Knowledge Base Dialog Box Look in Select the folder that contains the knowledge base. File list Displays the files and folders in the current folder. Select the knowledge base you want to import. File Name Displays the name of the file that will be opened. File of Type Specify the type of file that you want to view in the file list. Buttons Open Click this button to open the currently selected file.
Project Builder User Interface Cancel Click this button to close the dialog box and discard any changes you made. Create Knowledge Base Activation Code Dialog Box Use this dialog box to create activation codes for a protected knowledge base. Figure 6-66. Create Knowledge Base Activation Code Required Information Displays the name of the current knowledge base and requests the information necessary to generate the activation code. Knowledge Base Name Name of the knowledge base.
Chapter 6 Serial Number of License Hardware Key Enter the serial number for the hardware key. Knowledge Base License Displays important information about the knowledge base, including the newly generated activation code. You can copy this information and provide it to end users or customers. Buttons Create Activation Code Click this button to generate the activation code. The code will be displayed in the Knowledge Base License area.
Project Builder User Interface Figure 6-67. Edit Document Dialog Box for a Training Set Document Menu The Edit Document dialog box has a standard, Windows-style menu bar, from which you can perform various operations. Figure 6-68. Edit Document Dialog Box Menu The menu offers access to several menu items: The File menu when adding a new document: • Validate document – checks the document for internal consistency and other types of problems, and displays any errors in the status bar.
Chapter 6 • Add to training folder – adds the document to the training folder. • Close – closes the dialog box. The File menu when editing a document: • Save + Exit – saves the changes and closes the dialog box. • Execute OCR – performs OCR on the document. • Delete – removes the selected document from the training set. • Close – closes the dialog box. The Edit menu: • Previous Field – navigates to the previous field in the list of fields. • Next Field – navigates to the next field in the list of fields.
Project Builder User Interface Toolbar The toolbar provides shortcuts to many menu items and quick access to all important features. Figure 6-69. Edit Document Dialog Box Toolbar for editing a Training Set Document Figure 6-70. Edit Document Dialog Box Toolbar for a newly inserted Training Set Document Table 6-16. Edit Document Toolbar – All Icons Button Description Save + Exit – saves the changes and closes the dialog box.
Chapter 6 Rotate Clockwise – rotates the image of the document 90 degrees clockwise. Rotate Counter clockwise – rotates the image of the document 90 degrees counter clockwise. No Highlighting– turns off highlighting of the extracted results. Display Results – turns on highlighting of the extracted results. Zoom In – enlarges the scale of the document image. Zoom Out – reduces the scale of the document image. Best Fit – scales the document image to fit the viewer window.
Project Builder User Interface Figure 6-71. Import Knowledge Base Dialog Box Available knowledge bases Displays the knowledge bases available for import. • Name – lists the name of the knowledge base file. • Protection – indicates the protection status for the knowledge base. There are three possible statuses: 3 No license protection. 3 License not activated. 3 Activated for nnnn (where nnnn is the serial number for the hardware key).
Chapter 6 • Type – indicates the type of knowledge base. • Owner – indicates the own of the data base. Selected knowledge bases Displays the knowledge bases you have selected for import. The list shows the name of the knowledge base. It provides the same information as described above. Buttons Browse Click this button to open the Select Knowledge Base dialog box. The knowledge bases you select are added to the list of available knowledge bases.
Project Builder User Interface Figure 6-72. Insert Knowledge Base Activation Code Dialog Box Info Displays some helpful instructions about licenses and activation codes. Required Information Displays the name of the current knowledge base and requests the information necessary to generate the activation code. Knowledge Base Name Name of the knowledge base. Serial Number of License Hardware Key Enter the serial number for the hardware key.
Chapter 6 Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box. Knowledge Base Activation Dialog Box Use this dialog box to display a list of activated serial numbers. You can also use it to add new activation codes to a protected knowledge base. Figure 6-73. Knowledge Base Activation Dialog Box Activated Serial Numbers Displays a list of serial numbers currently activated for the knowledge base.
Project Builder User Interface Close Click this button to exit the dialog box. Help Click this button to view the online Help topic for this dialog box. Move Training Document Dialog Box Use this dialog box to move a document to the training set for the selected class. Figure 6-74. Move Training Document Dialog Box To move a document, select the destination class from the list. OK Click this button to save your settings. Cancel Click this button to close the dialog box and discard any changes you made.
Chapter 6 • Amount Group Locator • Barcode Locator • Classification Locator • Database Evaluator • Database Locator • Format Locator • Invoice Group Locator • Invoice Header Locator • OCR Voting Evaluator • Order Group Locator • Relation Evaluator • Script Locator • Standard Evaluator • Table Locator • Zone Locator User Interface Elements Depending on the selected locator different properties dialog boxes are displayed. In general one to four tabs are necessary to set parameters.
Project Builder User Interface Click Left button to move the provided buttons from the lower left side to the lower right side of the dialog box. Other buttons on the properties dialog box depend on the currently selected locator. Address Evaluator Properties Dialog Box The Address Evaluator Properties dialog box has the following tabs: • General • Mapping • Test Results Buttons The following buttons are available on all tabs: Close Click this button to save your settings and close the dialog box.
Chapter 6 Figure 6-75. Address Evaluator Properties – General Tab Import file Shows the location of the text or csv file that contains the database records. Select the path by clicking the folder icon. Import Options Ignore case Select this option to ignore the case in search and lookup strings. First line contains caption Select this option if the first record of the input file contains the column headers. These names will be used as field names in the database evaluator.
Project Builder User Interface specific as possible, and in any case longer than the replace text. A typical example is where the street names are normalized such as “Avenue,” “Aven.,“ “Ave.” are all replaced with “Av.” Add Click this button to add a new string substitution to the list of substitutions using the values from the two edit fields. Delete Click this button to delete the currently selected item in the list of substitutions.
Chapter 6 Figure 6-76. Address Evaluator Properties – Mapping Tab Mapping Database fields to locator results After specifying the import file, all the fields will be available in the left column of the mapping grid. A list of available locators will be shown in the second column. For each database field you can have a corresponding locator field and field type. There are five field types; Zip, City, Street, Numeric, and Other text.
Project Builder User Interface This area shows the first 20 lines of the referenced database. Each database field is displayed in a column. Address Evaluator Properties Dialog Box – Test Results Tab Use the Test Results tab to evaluate your settings. Figure 6-77. Address Evaluator Properties – Test Results Tab Detected Alternatives This area shows the test results, including the confidence of each alternative and the text that was extracted from the document.
Chapter 6 • Test Results Buttons The following buttons are available on all tabs: Close Click this button to save your settings and close the dialog box. Test Click this button to test the settings. The results are shown on the Test Results tab, which is automatically selected. Help Click this button to view the online Help topic for this dialog box.
Project Builder User Interface Insert Sample Click this button to add a document as sample file. The first sample file is used as reference document on which all zones are defined. Note For one class there is only one set of sample documents. If you define several Advanced Zone Locators, they share the same set of sample pages and reference document. Delete Sample Click this button to remove the currently selected sample document from the list.
Chapter 6 Registration Select the type of registration to be use for a zone. Select either Automatic to use normal registration or OCR for OCR registration. Automatic registration corrects small shifts, skew, and linear stretching by adjusting the document using graphical elements such as lines and other background elements. With OCR registration, the document is aligned to the typewritten text of the document.
Project Builder User Interface Figure 6-80. Advanced Zone Locator’s - SubFields Tab Subfield Definition The panel shows a list of all subfields that have been created for the locator. Subfields are fields that can be assigned to extraction fields. For each subfield, you can specify what results are to be returned by selecting Best or All from the list. If All is selected, you can specify delimiters by entering them in the Delimiter column. Buttons Add Click this button to add an additional subfield.
Chapter 6 click this button. The mapping on this tab is intended to be used for zones that were added manually and where the subfields were not automatically generated. Zone/Subfield Mapping The mapping is normally done automatically when you draw a zone in the viewer. For manually assigning a subfield to a zone, select a zone in the panel and select the subfield you want to use from the list in the Subfield column. Multiple zones may be mapped to the same subfield.
Project Builder User Interface Zone Viewer Dialog Box The Advanced Zone Locator provides a viewer to display documents where you can define OCR and OMR zones. The zones that you draw on the document are automatically added to the list of zones for the locator. Viewer Toolbar The Document Viewer toolbar provides shortcuts to many menu items and quick access to important features. Toolbar Buttons Description Previous/Next – navigates to the previous or next page or document.
Chapter 6 Zoom out. Best Fit – used to automatically size the document so that it fits entirely within the viewer. Document Source - used to select the type of document you want to view. Select Reference Document, Sample Documents, or Test Documents. Use Reference Document to add zones. Use Sample Documents to check the zone position for other documents. Use Test Documents to test the settings. Help – view the online Help topic for this dialog box. Figure 6-82.
Project Builder User Interface Barcode Locator Properties Dialog Box The Bar Code Locator Properties dialog box has the following tabs: • Bar Code • Regions • Test Results Buttons The following buttons are available for all tabs: Close Click this button to save your settings and close the dialog box. Test Click this button to test the settings. The results are shown on the Test Results tab, which is automatically selected. Help Click this button to view the online Help topic for this dialog box.
Chapter 6 Figure 6-83. Bar Code Locator Properties – Bar Code tab Type Auto Detect is selected by default. If you want to use one of the other types, first unselect Auto Detect. Orientation By default the orientation of the bar code is detected automatically. Select other orientations as appropriate. Length Used to define the minimum and/or maximum length of the bar code that means number of digits. First select Restrict length and then provide appropriate minimum and maximum values.
Project Builder User Interface Regions are defined manually by drawing rectangles on the document or by restricting the locator to specific pages. The coordinates of a region are shown in the list at the bottom of the dialog box. Instead of drawing a region you can click “Add” to create a new region with default parameters. The region can then be adjusted with the mouse or by changing the values in the list. Figure 6-84.
Chapter 6 Middle pages: If selected, the locator algorithm operates on all pages between the first and last pages. If this option is selected, recognition is only performed for documents with at least three pages. Last page: If selected, the locator algorithm operates on the last page. If the document consists only of one page, the first page is also the last page. Note The defined regions also control the number of pages to be processed by OCR in Ascent Xtrata Pro Server.
Project Builder User Interface Left The Left coordinate of the region in mm or as a percentage. The value can be changed manually in absolute units or by adding a percentage sign. For example choosing 30% for Left means the left edge of the region always begins at 30% of the document width as measured from the left. Top The Top coordinate of the region in mm or as a percentage. The value can be changed manually in absolute units or by adding a percentage sign.
Chapter 6 Figure 6-86. Bar Code Locator – Test Results Tab Detected Alternatives This area shows the test results, including the confidence of each bar code, and the text that was extracted from the bar code. Classification Locator Properties Dialog Box The Classification Locator Properties dialog box has the following tabs: • General • Regions • Test Results Buttons The following buttons are available on all tabs: Close Click this button to save your settings and close the dialog box.
Project Builder User Interface Help Click this button to view the online Help topic for this dialog box. Classification Locator Properties Dialog Box – General Tab Use the General tab to define the referenced project file, the classification mode, and other settings. Figure 6-87. Classification Locator Properties – General Tab Referenced project file: Enter or browse to the location of the referenced project file.
Chapter 6 Line by Line Each text line will be classified individually and returned as an alternative if the confidence is high enough. The result is sorted by confidence. The coordinates of the line are included with the returned alternatives, and are highlighted in the document. The coordinates are accessible by scripts. Result mode Use these options to define the level of detail for the returned results.
Project Builder User Interface Instead of drawing a region you can click “Add” to create a new region with default parameters. The region can then be adjusted with the mouse or by changing the values in the list. Figure 6-88. Classification Locator Properties – Regions Tab Enable locator for Locators can be restricted to specific pages of the document using this option. Any regions that have been defined can be independently defined for one or more specific pages.
Chapter 6 Note The defined regions also control the number of pages to be processed by OCR in Ascent Xtrata Pro Server. If you want to limit the number of pages, you should disable the appropriate page options in each locator that requires OCR results. Access Select “Outside the regions” or “Inside the regions”. If “Outside the regions” is selected, all manually entered regions are denied areas. This means the locator operates on the entire document except the regions that have been defined.
Project Builder User Interface Width The Width of the region in mm or as a percentage. The value can be changed manually in absolute units or by adding a percentage sign. For example choosing 30% for Width means the width of the region is always 30% of the document width. Height The Height of the region in mm or as a percentage. The value can be changed manually in absolute units or by adding a percentage sign.
Chapter 6 Figure 6-90. Classification Locator – Test Results Tab Detected Alternatives This area shows the test results, including the confidence of each result, and the text that was extracted from the document. Database Evaluator Properties Dialog Box The Database Evaluator Properties dialog box has the following tabs: • General • Test Results Buttons The following buttons are available on all tabs: Close Click this button to save your settings and close the dialog box.
Project Builder User Interface Database Evaluator Dialog Box – General Tab Use the General tab to select the database and to map database fields to locators. Figure 6-91. Database Evaluator – General Tab Database Area Select database Select the database to be used in this locator this from the list. Database settings Click this button, to open the Project Settings dialog box – Databases tab to add a new database or to change the properties of an existing database.
Chapter 6 Database Evaluator Dialog Box – Test Results Tab Use the Test Results tab to evaluate your settings. Figure 6-92. Database Evaluator – Test Results Tab Detected Alternatives This area shows the test results, including the confidence of each result, and the text that was extracted from the document.
Project Builder User Interface Close Click this button to save your settings and close the dialog box. Test Click this button to test the settings. The results are shown on the Test Results tab, which is automatically selected. Help Click this button to view the online Help topic for this dialog box. Database Locator Properties Dialog Box – General Tab Use the General tab to select databases, set the maximum number of alternatives, and to set the confidence threshold. Figure 6-93.
Chapter 6 Database settings Click this button to open the Project Settings Database tab to manage databases in the project. Locator Algorithm properties Max. alternatives Enter a number to limit the number of alternatives that will be returned by this locator. Min. confidence Specify the threshold for the minimum confidence required for a match to be used as an alternative. Only matches with a confidence greater than the threshold will be returned. A typical value for the confidence is 50.
Project Builder User Interface Group Index The group index can be used to indicate that field values should have some sort of identifiable geometric relationship with each other. For example, some items on a document, like zip code and city name, will be geometrically closely related and can be grouped. You can define up to five groups by simply assigning a group index to one or more fields in the field list.
Chapter 6 Database Locator Properties Dialog Box – Regions Tab Use the Regions tab to manage the pages and regions the locator will investigate. By default, the entire document is enabled on all pages. To restrict the locator to certain pages and areas, you can define regions that include or exclude areas. Regions are defined manually by drawing rectangles on the document or by restricting the locator to specific pages. The coordinates of a region are shown in the list at the bottom of the dialog box.
Project Builder User Interface First page: If selected, the locator algorithm operates only on the first page, keep in mind that in this case locator regions have to be defined for the first page. Middle pages: If selected, the locator algorithm operates on all pages between the first and last pages. If this option is selected, recognition is only performed for documents with at least three pages. Last page: If selected, the locator algorithm operates on the last page.
Chapter 6 Page A region can be restricted to a page or a set of pages. Click in the “Page” column and select another value from the list. Left The Left coordinate of the region in mm or as a percentage. The value can be changed manually in absolute units or by adding a percentage sign. For example choosing 30% for Left means the left edge of the region always begins at 30% of the document width as measured from the left. Top The Top coordinate of the region in mm or as a percentage.
Project Builder User Interface Database Locator Properties Dialog Box – Test Results Tab Use the Test Results tab to evaluate your settings. Figure 6-97. Database Locator Properties – Test Results Tab Detected Alternatives This area shows the test results, including the confidence of each result, and the text that was extracted from the document. If the Replaced option is selected for a field (on the Fields tab), then the results here will show the value after the replacement has been made.
Chapter 6 The following buttons are available on all tabs: Close Click this button to save your settings and close the dialog box. Test Click this button to test the settings. The results are shown on the Test Results tab, which is automatically selected. Help Click this button to view the online Help topic for this dialog box. Format Locator Properties Dialog Box – Formats Tab Use the Formats tab to define the formats that will be assigned to the locator.
Project Builder User Interface Figure 6-98. Format Locator - Formats Tab Use results from locator If you select this option, you can select a locator from the list of available locators. To save time, this option allows you to use the results of some other format locator in this format locator. For example it could be desirable to extract several dates which all have the same format. The only difference is in the keyword next to the format matches such as “Birth date” or “Application date.
Chapter 6 Format You can create one or more formats or choose them from a list of predefined formats. To use the predefined formats, click the right arrow button to the right of the format text field. Predefined formats are divided into the following categories: • Numbers • Characters • Dates • Amounts • Dictionaries Figure 6-99. Predefined Number Formats You can select a predefined format and modify it before you add it to the list of defined formats.
Project Builder User Interface If “Whole word” is checked, a date with the 4-digit year gets a smaller confidence. If “Whole word” is unchecked, the date gets 100% confidence. Insert characters that should not be considered. Ignore blanks Select this property to force the format to ignore blanks (spaces). Search dictionary exact If this option is selected, dictionaries are searched for exact matches; otherwise the searches are fault tolerant.
Chapter 6 expression, the parts that conform are displayed in the list. In the figure below, “1234” is the first sample that matches the second regular expression - “\d{4}.” The sample “123456789” matches the first and second regular expressions. Figure 6-101. Test Values Format Locator Properties Dialog Box – Evaluation Settings Tab Use the Evaluation Settings tab to manage keywords than can be configured to improve the ability of the locator to find the desired items.
Project Builder User Interface Figure 6-102. Format Locator – Evaluation Settings Tab Keywords Keyword You can enter a keyword or insert a set of keywords by selecting a dictionary. To use a dictionary, click the right arrow button. You can select an existing dictionary or add a new one by selecting Dictionary Settings. Search dictionary exact. If this option is selected, the dictionary is searched for exact matches; otherwise the searches are fault tolerant.
Chapter 6 Negative values can be assigned to use keywords as stop words (for example, “delivery date” could be used weighted at -100% if you are looking for the invoice date). The value can be entered manually or by the slider. Relation The Relation parameter defines the geometrical relationship between the format match and the keyword relative to the match. In other words, “Left” means the keyword must be to the left of the format match. Figure 6-103.
Project Builder User Interface Add Click this button to add a keyword to the list of defined keywords after you have specified the keyword and set its properties. Modify Click this button to save changes to the currently selected keyword. Clear Click this button to remove all content and selections from the Keywords area. Format Locator Properties Dialog Box – Regions Tab Use the Regions tab to manage the pages and regions the locator will investigate.
Chapter 6 Figure 6-104. Format Locator Properties – Regions Tab Enable locator for Locators can be restricted to specific pages of the document using this option. Any regions that have been defined can be independently defined for one or more specific pages. All pages This option is selected by default. That means the region settings apply to all pages of the document. If the “All pages” option is selected, the other page options are disabled.
Project Builder User Interface Last page: If selected, the locator algorithm operates on the last page. If the document consists only of one page, the first page is also the last page. Note The defined regions also control the number of pages to be processed by OCR in Ascent Xtrata Pro Server. If you want to limit the number of pages, you should disable the appropriate page options in each locator that requires OCR results. Access Select “Outside the regions” or “Inside the regions”.
Chapter 6 Top The Top coordinate of the region in mm or as a percentage. The value can be changed manually in absolute units or by adding a percentage sign. For example choosing 30% for Top means the top edge of the region always begins at 30% of the document height as measured from the top. Width The Width of the region in mm or as a percentage. The value can be changed manually in absolute units or by adding a percentage sign.
Project Builder User Interface Figure 6-106. Format Locator Properties – Test Tab Detected Alternatives This area shows the test results, including the confidence of each result, and the text that was extracted from the document.
Chapter 6 Close Click this button to save your settings and close the dialog box. Test Click this button to test the settings. The results are shown on the Test Results tab, which is automatically selected. Help Click this button to view the online Help topic for this dialog box.
Project Builder User Interface Figure 6-107. Invoice Header Locator Properties – General Settings Panel Format Locators Use the Format Locators panel to select the four supporting format locators for the invoice header locator.
Chapter 6 Figure 6-108. Invoice Header Locator Properties – Format Locators Panel Invoice number locator Select a format locator that finds all values on a document that are possible invoice numbers. Order number locator Select a format locator that finds all values on a document that are possible order numbers. Date locator Select a format locator that finds all values on a document that are possible dates.
Project Builder User Interface Figure 6-109. Invoice Header Locator Properties – Taxes Panel Percent field Enter the tax rate here. List of defined tax rates This area contains all the tax rates that have been defined for the Invoice Header locator. Buttons Add Click this button to insert a regional tax in the list of tax rates. Delete Click this button to remove the currently selected tax rate.
Chapter 6 Currencies Use the Currencies panel to enter all valid abbreviations that are used for currencies expected on the invoices. Figure 6-110. Invoice Header Locator Properties – Currencies Panel Abbreviation Use this to add abbreviations or symbols for each type of currency. For example you can add USD or $ for U.S. dollars. Result Enter the desired output value. For example if the locator finds that the abbreviation is in USD dollars, you can define “U.S. Dollars” as the output result.
Project Builder User Interface Highlighting Use the Highlighting panel to set the highlighting colors. Figure 6-111. Invoice Header Locator Properties – Highlighting Panel The invoice header locator retrieves 13 subfields, such as Order Date, Invoice Number, and Tax. You can assign a highlight color for each type of subfield when it is displayed in the viewer during testing. Select the color you want from the list next to each subfield name.
Chapter 6 Figure 6-112. Invoice Header Locator Properties – Invoice Number Keywords Tab You can enter keywords for invoice numbers and assign a weight to each keyword. For example, “invoice number” would likely have a high weight of 100% whereas just “number” alone might have a lower weight around 50%. You can also assign negative weights to words like “customer number.” Instead of keywords, dictionaries can be used. Dictionaries are defined in the Project Settings dialog box Dictionaries Tab.
Project Builder User Interface Weight A keyword can be assigned a weight between -100% and 100%. Choose a value that is related to the relevance of the keyword. If a keyword is unique and never can lead to a wrong result, assign 100%. If the keyword must be found in combination with other keywords or might indicate a wrong result, use a lower weight.
Chapter 6 Figure 6-113. Invoice Header Locator Properties – Invoice Date Keywords Tab You can enter keywords for invoice dates and assign a weight to each keyword. For example, “invoice date” would likely have a high weight of 100% whereas just “date” alone might have a lower weight around 50%. You can also assign negative weights to words like “delivery date.” Instead of keywords, dictionaries can be used. Dictionaries are defined in the Project Settings dialog box Dictionaries Tab.
Project Builder User Interface Weight A keyword can be assigned a weight between -100% and 100%. Choose a value that is related to the relevance of the keyword. If a keyword is unique and never can lead to a wrong result, assign 100%. If the keyword must be found in combination with other keywords or might indicate a wrong result, use a lower weight.
Chapter 6 Figure 6-114. Invoice Header Locator Properties – Order Number Keywords Tab You can enter keywords for order numbers and assign a weight to each keyword. For example, “order number” would likely have a high weight of 100% whereas just “number” alone might have a lower weight around 50%. You can also assign negative weights to words like “customer number.” Instead of keywords, dictionaries can be used. Dictionaries are defined in the Project Settings dialog box Dictionaries Tab.
Project Builder User Interface Weight A keyword can be assigned a weight between -100% and 100%. Choose a value that is related to the relevance of the keyword. If a keyword is unique and never can lead to a wrong result, assign 100%. If the keyword must be found in combination with other keywords or might indicate a wrong result, use a lower weight.
Chapter 6 Figure 6-115. Invoice Header Locator Properties – Order Date Keywords Tab You can enter keywords for order dates and assign a weight to each keyword. For example, “order date” would likely have a high weight of 100% whereas just “date” alone might have a lower weight around 50%. You can also assign negative weights to words like “delivery date.” Instead of keywords, dictionaries can be used. Dictionaries are defined in the Project Settings dialog box Dictionaries Tab.
Project Builder User Interface Weight A keyword can be assigned a weight between -100% and 100%. Choose a value that is related to the relevance of the keyword. If a keyword is unique and never can lead to a wrong result, assign 100%. If the keyword must be found in combination with other keywords or might indicate a wrong result, use a lower weight.
Chapter 6 Figure 6-116. Invoice Header Locator Properties - Test Result Tab Detected Alternatives This area shows the test results, including the confidence of each result, and the text that was extracted from the document. OCR Voting Evaluator Properties Dialog Box The OCR Voting Evaluator Properties dialog box has the following tabs: • General • Test Results Buttons The following buttons are available on all tabs: Close Click this button to save your settings and close the dialog box.
Project Builder User Interface Help Click this button to view the online Help topic for this dialog box. OCR Voting Evaluator Dialog Box – General Tab Use the General tab to add, edit, and delete subfields from a pair of locators so that they can be compared for voting. Figure 6-117. OCR Voting Evaluator – General Tab Assign Locators for Voting This list contains the locators that are to be compared by the OCR Voting Evaluator. The list contains three columns.
Chapter 6 Input Locator 1 Select a subfield of the first locator that is to be used for comparison. Click on the entry to open a list of all available subfields. Input Locator 2 Select a subfield of the second locator that is to be used for comparison. Click on the entry to open a list of all available subfields. Field Name Name of the new subfield used to hold the voting results. Click on the entry to change the name.
Project Builder User Interface Figure 6-118. OCR Voting Evaluator – Test Results Tab OCR Test Results This section displays the final voting results for each voting pair. Field Name Name of the subfield as defined in the General tab. Result String Final result of the vote for each subfield. Details for <> This section shows the character by character voting results for the currently selected field name.
Chapter 6 Input Locator 1 This shows the OCR result by character of the first locator. Rest the mouse pointer over a character to see its confidence level. If the OCR engine returns more than one character above the confidence threshold, you will see them in additional rows for the input locator. Input Locator 2 This shows the OCR result by character of the second locator. Rest the mouse pointer over a character to see its confidence level.
Project Builder User Interface Figure 6-119. Relation Evaluator – Settings Tab Locators Specify two locators whose results have a geometric relationship. First select a locator from the Find all alternatives of locator list, then specify the relationship, and finally select another locator from the alternatives of locator list. Settings Specify the number of alternatives the evaluator should return.
Chapter 6 Select this option if you want the confidence level to be adjusted as a function of the distance between the two alternatives. This means that alternatives separated by a greater distance will have a lower confidence. Combine original confidence with distance Select this option if you want to sum the original confidence and the distance.
Project Builder User Interface Figure 6-120. Relation Evaluator – Test Tab Detected Alternatives This area shows the test results, including the confidence of each result, and the text that was extracted from the document.
Chapter 6 Close Click this button to save your settings and close the dialog box. Show Script Click this button to open the Sax Basic script editor. See Extraction – Script Locator for details. Test Click this button to test the settings. The results are shown on the Test Results tab, which is automatically selected. Help Click this button to view the online Help topic for this dialog box.
Project Builder User Interface Simple field: Select this option to define a simple field as the locator result. The alternatives returned as results will consist of one field. Group field: Select this option to define a structure with sub fields as the locator result. The alternatives returned as results will consist of the sub fields in the group. Of course, the assignment of values to the subfields have to be handled by the script. Buttons Add Click this button to add another sub field to the group.
Chapter 6 Figure 6-122. Script Locator Properties – Regions Tab Enable locator for Locators can be restricted to specific pages of the document using this option. All pages This option is selected by default. That means the settings apply to all pages of the document. If the “All pages” option is selected, the other page options are disabled. First page: If selected, the locator algorithm operates only on the first page, keep in mind that locator regions have to be defined for the first page then.
Project Builder User Interface Note The settings also control the number of pages to be processed by OCR in Ascent Xtrata Pro Server. If you want to limit the number of pages, you should disable the appropriate page options in each locator that requires OCR results. Within the script, the defined regions are used to restrict the area on which the script program should be performed. Script Locator Properties Dialog Box – Test Results Tab Use the Test Results tab to evaluate your settings. Figure 6-123.
Chapter 6 Standard Evaluator Properties Dialog Box The Standard Evaluator Properties dialog box has the following tabs: • Settings • Test Buttons The following buttons are available on all tabs: Close Click this button to save your settings and close the dialog box. Test Click this button to test the settings. The results are shown on the Test Results tab, which is automatically selected. Help Click this button to view the online Help topic for this dialog box.
Project Builder User Interface Locators This area shows all the available and selected locators. Available Locators This is a list of all available locators. Select any locator from this list and click the right arrow button to move the locator to the selected locators list. Selected Locators This is a list of all the locators that have been selected. Select any locator from the list and click the left arrow button to move the locator back to the Available Locators list.
Chapter 6 Figure 6-125. Standard Evaluator – Test Tab Detected Alternatives This area shows the test results, including the confidence of each result, and the text that was extracted from the document. Table Locator Properties Dialog Box The Table Locator Properties dialog box has the following tabs: • Settings • Master Item • Cells • Test Results Buttons The following buttons are available on all tabs: Close Click this button to save your settings and close the dialog box.
Project Builder User Interface Test Click this button to test the settings. The results are shown on the Test Results tab, which is automatically selected. Help Click this button to view the online Help topic for this dialog box Table Locator Properties Dialog Box – Settings Tab Use the Settings tab to make general settings and configure the types of extraction. Figure 6-126.
Chapter 6 Table models Click this button to open the Project Settings dialog box Tables tab where you can define or edit table models. For more information on how to set up a table see Extraction – Table Locator. Select detection method There are two methods that can be used to locate tables on a document: Automatic or Manual. Depending on the method, different tabs are enabled. Automatic Select this method if you want the Table Locator to search the document for tables and automatically identify them.
Project Builder User Interface Select language package Select one or more language packages from among the language packages in this list. The contents of this list are determined by the language packages defined in the Project Settings dialog box Tables tab. Manual Table Extraction This area is only enabled if you have selected the Manual detection method. Select a sample image Select a sample image from the test folder. The master line item will be defined (on the Master Item tab) using this image.
Chapter 6 Figure 6-127. Table Locator Properties – Master Item Tab Line Settings Optional Lines Sometime a row in a table may take up several lines on the document, in which case the second and subsequent lines may be optional. This field shows the number of optional lines in the row based on which lines were flagged optional in the Lines area below Many comments per item Select this option if your tables might have large gaps caused by comments or other items between the line items.
Project Builder User Interface To use this feature properly, you should use the line item in the table that has the most rows as the master item. Then flag as optional those rows which are not present in the other line items in the table. Table Locator Properties Dialog Box – Cells Tab Use the Cells tab to divide the master item into individual columns (cells). This tab is only enabled if you have selected the Manual detection method. Figure 6-128.
Chapter 6 If you select a cell, its settings will be shown here. Model column Shows which column this cell is assigned to. The assignment can be changed here. The name of the column is also displayed below each cell in the viewer. Optional Check this if the cell may not be present in all line items Fixed number of lines Currently not used. Startline This value shows in which text line the cell begins. If the value is ‘-1’ then try to adjust the cell using the mouse until value becomes positive.
Project Builder User Interface Table Locator Properties Dialog Box – Test Results Tab Use the Test Results tab to evaluate your settings. Figure 6-129. Table Locator Properties – Settings Tab This area shows what the table locator finds. On the left of this tab there are four buttons. Cells This mode highlights each cell individually in the document viewer. All Rows This mode highlights each line item separately in the viewer. This is the default display mode.
Chapter 6 Zone Locator Properties Dialog Box The Zone Locator Properties dialog box has the following tabs: • General • Zones • SubFields • OCR Profiles • Test Results Buttons Close Click this button to save your settings and close the dialog box. Test Click this button to test the settings. The results are shown on the Test Results tab, which is automatically selected. Help Click this button to view the online Help topic for this dialog box.
Project Builder User Interface Figure 6-130. Zone Locator’s - General Tab Buttons Insert Sample Click this button to add a document as sample file. The first sample file is used as reference document on which all zones are defined. Delete Sample Click this button to remove the currently selected sample document from the list. Background Removal If at least five samples of the same class of document have been added, you can take advantage of the background removal feature.
Chapter 6 Figure 6-131. Advanced Zone Locator’s - Zones Tab Registration Select the type of registration to be use for a zone. Select either Automatic to use normal registration or OCR for OCR registration. Automatic registration corrects small shifts, skew, and linear stretching by adjusting the document using graphical elements such as lines and other background elements. With OCR registration, the document is aligned to the typewritten text of the document.
Project Builder User Interface Delete Click this button to remove the currently selected zone. Rename Click this button to change the zone’s name. Properties Click this button to open the zone’s properties dialog box. Zone Locator Properties Dialog Box – SubFields Tab Use the Subfields tab to create subfields and map them to the zones. Figure 6-132. Zone Locator Properties - SubFields Tab Subfield Definition The panel shows a list of all subfields that have been created for the locator.
Chapter 6 Add Click this button to add an additional subfield. Delete Click this button to remove the currently selected subfield. Rename Click this button to change the subfield’s name. Auto Mapping By default, a subfield is automatically created and mapped to the zones. If you have deleted the subfield and want to automatically create a new subfield, then click this button.
Project Builder User Interface Figure 6-133. Zone Locator Properties – OCR Profiles Tab List of Profiles The list of profiles shows all the profiles that have been defined for the zone locator. Buttons Add Click this button to add a new profile to the list of profiles. Delete Click this button to delete the currently selected profile. Rename Click this button to rename the currently selected profile. Copy Click this button to copy the currently selected profile.
Chapter 6 Export Click this button to export the currently selected profile to a file with the extension “.opr”. Import Click this button to import a previously exported profile into the project. Exported profiles are stored in files with the extension “.opr” and help you to reuse recognition settings in different zone locators. The profile is imported with a default name. Zone Locator Properties Dialog Box – Test Results Tab Use the Test Results tab to evaluate your settings. Figure 6-134.
Project Builder User Interface Zone Viewer Dialog Box The Zone Locator provides a viewer to display documents where you can define OCR and OMR zones. The zones that you draw on the document are automatically added to the list of zones for the locator. Figure 6-135. Zone Viewer Dialog Box Viewer Toolbar The Document Viewer toolbar provides shortcuts to many menu items and quick access to important features. Toolbar Buttons Description Previous/Next – navigates to the previous or next page or document.
Chapter 6 Selection Mode – used to select zones in the viewer window. If you double-click a zone, the properties dialog box will display. Add Text Zone – used to draw an OCR text zone on the reference document. Add OMR Zone – used to draw an OMR zone on the reference document. Remove Background – used to perform background removal for the document. Note that besides the reference document, at least four additional sample documents are required before you can perform background removal. Zoom In. Zoom out.
Project Builder User Interface • Order Group Locator Amount Group Locator Properties Dialog Box The Amount Group Locator Properties dialog box has the following tabs: • General • Document Type • Validation • Advanced • Test Results Amount Group Locator Properties Dialog Box – General Tab Use the General tab to set and configure the formatters, the confidence value, and the VAT-based amount location. Figure 6-136.
Chapter 6 Formatting Use this panel to select an amount formatter and a percentage formatter using predefined profiles. For more information on formatters see Project Settings dialog box Formatting tab. Results Set the minimum confidence threshold for the locator. Values less than the threshold will not be returned. VAT-based amount location The Amount Group Locator searches for amounts using the following equations for up to two tax rates.
Project Builder User Interface Figure 6-137. Amount Group Locator Properties – Document Type Tab Keywords You can enter keywords that uniquely identify the document as either an invoice or a letter of credit. Invoice Specify the value that is to be used if the document has identified as an invoice. Credit Specify the value that is to be used if the document has identified as a letter of credit.
Chapter 6 This area contains a list of all the defined keywords. A keyword can be edited by selecting it and making the changes. A keyword can be assigned a weight between -100 and 100. Choose a value that is related to the relevance of the keyword. If a keyword is unique and never can lead to a wrong result, assign 100. If the keyword must be found in combination with other keywords or might indicate a wrong result, use a lower weight.
Project Builder User Interface Figure 6-138. Amount Group Locator Properties - Formatting Tab Validation Method Select an invoice validation method from the list. Field Mapping Map the extraction fields to the validation method fields. You can either use the default invoice validation method or define your own invoice validation method using the Project Settings dialog box -Validation Tab.
Chapter 6 Figure 6-139. Amount Group Locator Properties - Advanced Tab Currencies Currency abbreviation Add abbreviations for currencies as they are expected to be found on the document. Result Enter the currency as it should be returned in the results. Search Confidence Set the threshold for the search confidence. The environment of each possible candidate is matched to all trained samples and if the confidence threshold is reached a candidate is added to the list of possible results.
Project Builder User Interface one candidate. If this candidate reaches the evaluation threshold it is copied to the result list. Amount Group Locator Properties Dialog Box – Test Results Tab Use the Test Results tab to evaluate your settings. Figure 6-140. Amount Group Locator Properties – Test Results Tab Detected Alternatives The test results are listed showing the confidence and text for each alternative.
Chapter 6 Invoice Group Locator Properties Dialog Box – General Tab Use the General tab to set and configure the date formatter, the confidence value, and the validation methods. Figure 6-141. Invoice Group Locator Properties - General Tab Date detection Select a date formatter from the list. This formatter will be used to format the invoice date. If necessary, you can change the settings for the default date formatter or create a new date formatter.
Project Builder User Interface Invoice Group Locator Properties Dialog Box – Advanced Tab Use this property tab to set the search and evaluation confidence thresholds. Figure 6-142. Invoice Group Locator Properties - Advanced Tab Search Confidence Set the threshold for the search confidence. The environment of each possible candidate is matched to all trained samples and if the confidence threshold is reached, a candidate is added to the list of possible results.
Chapter 6 Figure 6-143. Invoice Group Locator Properties – Test Results Tab Detected Alternatives The test results are listed showing the confidence and text for each alternative. Order Group Locator Properties Dialog Box The Order Group Locator Properties dialog box has the following tabs: • General Tab • Advanced • Test Results Order Group Locator Properties Dialog Box – General Tab Use the General tab to set and configure the date formatter, the confidence value, and the validation methods.
Project Builder User Interface Figure 6-144. Order Group Locator Properties – General Tab Date detection Select a date formatter from the list. This formatter will be used to format the order date. If necessary, you can change the settings for the default date formatter or create a new date formatter. For further details see Project Settings - Formatting Tab. Results Set the minimum confidence threshold for the locator. Values less than the threshold will not be returned.
Chapter 6 Figure 6-145. Order Group Locator Properties – Advanced Tab Search Confidence Set the threshold for the search confidence. The environment of each possible candidate is matched to all trained samples and if the confidence threshold is reached, a candidate is added to the list of possible results. Evaluation Confidence Set the threshold for the evaluation confidence. In the first step, the subfields that are distributed over the whole document are searched separately.
Project Builder User Interface Figure 6-146. Order Group Locator Properties – Test Results Tab Detected Alternatives The test results are listed showing the confidence and text for each alternative.
Chapter 6 516 Ascent Xtrata Pro User's Guide
Chapter 7 Setup a Batch Class in Ascent Capture Introduction Ascent Xtrata Pro is a tool for processing unstructured and structured documents by the classification and extraction of items on the document. The classification process results in a category (class) that the document is assigned to. Classification can be hierarchical with every node in the hierarchy being a possible result. The extraction process results in the locations and values of items on the document.
Chapter 7 When the synchronized batch class is published, the classification and extraction project data is processed in the same way as all other batch class data to provide a stable set of settings for existing batches. Adding Ascent Xtrata Pro to a Batch Class Ascent Xtrata Pro Server can be added to any Ascent Capture batch class. Normally, Ascent Xtrata Pro Server should be positioned right after the Scan module in the list of queues. It can be used in place of the Ascent Capture Recognition Server.
Setup a Batch Class in Ascent Capture using the Ascent Capture Custom Module Manager. See the Installation Guide for Ascent Xtrata Pro for details. 6 Click Apply to save your settings without closing the Batch Class Properties dialog box. Click OK to save your settings and close the dialog box. Batch Class Considerations The following sections describe the requirements for Ascent Capture batch classes that include Ascent Xtrata Pro modules.
Chapter 7 Publishing Batch Classes Ascent Capture batch classes must be published before they can be used for creating batches. In addition, they must be republished whenever batch class settings change. Note You cannot publish a batch class that includes Ascent Xtrata Pro Server unless the batch class has been synchronized with an Ascent Xtrata Pro project. Attempting to do so will result in a publishing error.
Setup a Batch Class in Ascent Capture Importing/Exporting Batch Classes Batch classes that include Ascent Xtrata Pro modules can be exported with the Ascent Capture Import/Export feature. If you import an exported batch class that includes Ascent Xtrata Pro Server, you must synchronize the project with the batch class again as otherwise special flags are missing that are needed when you publish the batch.
Chapter 7 Open Synchronization Tool You can open the Synchronization tool from the context menu of any batch class that includes Ascent Xtrata Pro Server. X To open the Synchronization tool 1 From the Administration module’s Definitions panel, select the Batch tab. 2 Right-click a batch class that includes Ascent Xtrata Pro Server to display a context menu. 3 Select “Synchronize Xtrata Pro Project” to open the Synchronization tool.
Setup a Batch Class in Ascent Capture To exit the Synchronization Tool, click Cancel; click Yes to load the existing mappings, or No to load the project without using any mappings. 4 Continue the synchronization process. For more details see Assigning Classes to Form Types on page 525, Assigning Extraction Fields to Index Fields of Document Classes on page 531 and Perform Synchronization on page 536.
Chapter 7 For testing on a local system, it may be useful to deactivate the option. This is especially true if you have a big databases and publish the project very often, as a lot of disk space will be wasted. Allow Batch Editing You can allow Batch Editing for a batch by selecting this option. If this option is selected, batch editing features, such as managing documents and folder structures can be performed.
Setup a Batch Class in Ascent Capture Assigning Classes to Form Types The Synchronization tool’s first screen allows you assigning classes to Ascent Capture form types. Once a project is loaded, the project tree is shown in the left panel. All classes that are not yet assigned are marked with question marks. Figure 7-2. Synchronization Tool - New Project Document classes and form types that are already available from the Ascent Capture batch class are displayed in the right panel.
Chapter 7 relationship between Ascent Xtrata Pro classes and Ascent Capture document class/form type pairs. • Create Form Type for each selected class: For each selected class, a new form type is created for the document class that is currently selected in the right panel. If no document class is available or selected, the button is disabled. This establishes a one-to-one relationship between Ascent Xtrata Pro classes and Ascent Capture form types.
Setup a Batch Class in Ascent Capture • Document class selected: When a document class is selected in the list of Ascent Capture document classes (the right panel), all Ascent Xtrata Pro classes that are assigned to any form type of this document class are highlighted yellow in the class tree. • Form type selected: When a form type is selected in the list of Ascent Capture document classes, all Ascent Xtrata Pro classes that are assigned to the form type are highlighted yellow in the class tree.
Chapter 7 Figure 7-4. Creating Document Classes and Form Types from a Class Tree If you click Yes, a separate and unique document class and form type are created for each class in the tree below the selected class. When you use “Create Form Type for each selected class” for a class that contains sub–nodes, a warning is displayed. Figure 7-5. Creating Form Types from a Class Tree If you click Yes, a form type for each class in the tree below the selected class is created under the current document class.
Setup a Batch Class in Ascent Capture If you click Yes, the currently selected form type is assigned to each class in the tree below the selected class. Changing Existing Projects Any changes that are applied to the original project after synchronization will be displayed in the tree of the classification project, for example if new classes are added. Those you have to assign to a form type and fields have to be mapped to index fields. In the end process synchronization..
Chapter 7 Figure 7-8. Synchronization Tool – Edit Menu • Add Form Type: Adds a new form type to the selected document class. The form type is added with a default name (NewFormType), but you can rename it as desired. Just right-click the new form type, select Rename, and enter the new name. The icons for the new form types are marked with an asterisk until synchronization occurs (the last step in the Synchronization tool).
Setup a Batch Class in Ascent Capture • Rename: Allows renaming the Ascent Capture form type. Only items that have not already been saved to Ascent Capture can be renamed from the Synchronization tool. (If desired, they can be renamed from the Ascent Capture Batch class tree.) • Unassign Form Type: Removes the mapping of the assigned form type. If there are any derived classes, a message box is shown. If you click Yes the mapping is removed recursively.
Chapter 7 Folder and Document classes that are already available from the Ascent Capture batch class or have been created in the previous step are displayed in the list on the left side. Extraction fields and their associated index fields for the selected document class are displayed in the list on the right side. Additional index fields can be created from within the Synchronization tool if needed.
Setup a Batch Class in Ascent Capture Figure 7-11. Batch Tree Document Listing in Validation In the Validation module, the field's content is appended to the document name in the batch tree. You can make it easier to identify a document by adding the customer name or customer id to the document name. If more than one field is selected, the contents are concatenated.
Chapter 7 Figure 7-12. Batch Tree Document Listing in Validation Creating Index Fields and Assigning Them to Extraction Fields The three buttons in the center of the Synchronization tool are used for creating and mapping index fields. • Create new index field: Creates a new index field with a name that you provide. (A dialog box is used to enter the name.) Figure 7-13. Create New Index Field dialog box The new field is added to the list of available fields for that document class.
Setup a Batch Class in Ascent Capture automatically assigns it. Multi-selection is not supported. The new field is also added to the list of available fields. This button is only available when the selected document class is assigned to an Ascent Xtrata Pro class that contains extraction fields. • Create new index field for all extraction fields and assign it: Creates a new index field for each extraction field and automatically assigns them.
Chapter 7 Note that if an index field was already assigned to another extraction field, the assignment is changed and the other extraction field is set to . An index field can be assigned to, at most, one extraction field. For index fields that are created inside the Ascent Xtrata Pro synchronization dialog (either manually or automatically), a predefined Ascent Capture field type called “ExtractionFieldType” is used.
Setup a Batch Class in Ascent Capture Figure 7-15. Start Synchronization Click Synchronize to start the synchronization process. A progress bar displays while the synchronization occurs.
Chapter 7 Figure 7-16. Running Synchronization When synchronization is finished, the Synchronization tool closes automatically and returns to the Ascent Capture Administration module. The batch class must be published before the settings take effect for new batches. Adding Ascent Xtrata Pro Validation to a Batch Class Ascent Xtrata Pro Validation can be added to any Ascent Capture batch class. Normally, Ascent Xtrata Pro Validation should be positioned right after Ascent Xtrata Pro Server.
Setup a Batch Class in Ascent Capture The procedure below describes how to add Ascent Xtrata Pro Validation to an existing batch class. Refer to your Ascent Capture documentation for details about creating batch classes. X To add Ascent Xtrata Pro Validation to an existing batch class 1 From the Administration module’s Definitions panel, select the Batch tab. 2 Right-click the batch class to which you want to add Ascent Xtrata Pro Validation.
Chapter 7 Statistics” options during installation. If not, you have to run the setup again, to install the missing features. For details see the Installation Guide for Ascent Xtrata Pro. In order to collect statistics or use online learning, the Release module needs to be added to the queues in a batch class, and the Ascent Xtrata Pro release script needs to be activated for each document class.
Setup a Batch Class in Ascent Capture a. From the Administration module Definitions panel, select the Batch tab and right-click the document class for which you want to activate the release script. b. From the context menu, select Release Scripts to open the Release Scripts dialog box. c. Select Xtrata Pro Statistics from the list of Available Release Scripts. d. Click Add to add it to the list of Assigned Release Scripts. The Ascent Xtrata Pro Release Setup dialog box - Statistic Release tab will display.
Chapter 7 f. To configure online learning, select the New Samples tab and enter the path for the online learning database. If necessary you can also change the default group value. The group value can be used to tag the group to facilitate sorts and for filtering. For example you may use the name of a supplier as the group value. The value of this field is saved in an additional data field. Figure 7-18. Ascent Xtrata Pro Release Script Setup Dialog box – New Samples tab g.
Chapter 8 Processing Batches Introduction Ascent Xtrata Pro Server is a custom module that can be added to the Ascent Capture batch class queue list. Ascent Xtrata Pro Server is typically placed just after the Scan module and before the Ascent Xtrata Pro Validation module in the workflow. Ascent Xtrata Pro Server runs as an unattended module. It performs classification and extraction of documents, and then passes the batch to the next queue in the workflow.
Chapter 8 High Availability Support The Ascent Xtrata Pro Server supports Ascent Capture’s high availability features. If the option is activated in Ascent Capture and file access fails, the server tries to restore file access. If file access is not possible within a certain period, an error is returned. For further information how to enable high availability see your Ascent Capture documentation.
Processing Batches dynamically binarized by Ascent Xtrata Pro Server. For multi-page documents, each color page is binarized and saved as a bitonal image. Ascent Xtrata Pro Server can also be used only for classification. In this case, no locator methods are defined for the class. The classification result is passed either as a form type or in a special field that stores the class name. Ascent Xtrata Pro classifies all documents in the batch and passes the results to the next queue in the workflow.
Chapter 8 5 X Change the Startup type to “Automatic.” To disable the automatic startup of the service change the Startup type back to manual. To process files with Xtrata Pro Server from a network share 1 From the Windows taskbar, select Start | Settings | Control panel | Administrative Tools | Services to display the Services utility. Note This sequence may differ according to the version of your operating system.
Processing Batches X How to start performance monitoring 1 Select Start | Settings | Control panel | Administrative Tools | Performance to display the Microsoft Windows performance count. Note This sequence may differ according to the version of your operating system. For example, in Windows XP, select Start | All Programs | Administrative Tools | Performance. 2 Right-click the graph and select ”Add Counters” from the context menu. The “Add Counters” dialog box will display.
Chapter 8 5 Once you have selected the counters, click Add to include the selected counters on the performance monitor. Note You may change the appearance of the graph by selecting Properties from the context menu. Quick Tour of the Ascent Xtrata Pro Server User Interface Ascent Xtrata Pro Server can be started from the Windows Start menu or from the Ascent Capture Batch Manager.
Processing Batches If run from the Start menu, Ascent Xtrata Pro Server immediately begins polling for batches to process. If batches are ready for Ascent Xtrata Pro Server, they are processed in order, based on priority and age. The batch having the highest priority is processed first, and if two batches have the same priority then the older batch is processed first. While processing a batch, progress and certain statistics are displayed.
Chapter 8 Interface Language The language of the graphical user interface depends on the language set in Project Builder. If Project Builder is not available, the default language (English) is used. To change the language, open Project Builder, select Tools | Select Language, then choose a language from the list in the Application Language dialog box. Polling Interval As an unattended module, Ascent Xtrata Pro Server requires no user intervention to process batches.
Processing Batches Log files are created in the Ascent Capture log folder. A new log file is created for each day. Every time a new batch is opened by Ascent Xtrata Pro Server, the program checks if a log file for the current day exists. If not, a file is created. If Ascent Xtrata Pro Server runs over midnight, the first batch that is opened after midnight causes the creation of a new log file. That keeps all log data for one batch together.
Chapter 8 552 Ascent Xtrata Pro User's Guide
Chapter 9 Ascent Xtrata Pro Validation Introduction Ascent Xtrata Pro Validation is another custom module that can be added to the list of selected queues for an Ascent Capture batch class. Ascent Xtrata Pro Validation is typically placed just after Ascent Xtrata Pro Server in the workflow and replaces the Ascent Capture Validation module. It is used by validation operators to manually correct misclassified documents and/or invalid extraction results.
Chapter 9 Figure 9-1. Ascent Xtrata Pro Validation Module Interface User Interface Elements The Ascent Xtrata Pro Validation module has a main menu and a toolbar for quick access to various features and commands. Below these, the interface is divided into three main areas: a navigation tree to the left where the batch contents are displayed, a field editing pane in the center where the validation form is displayed, and an image viewer to the right where the document page is displayed.
Ascent Xtrata Pro Validation Menu Bar Ascent Xtrata Pro supports a standard, Windows-style menu bar from which you can perform various operations. Figure 9-2. Menu Bar The menu bar offers access to several menus: The Batch menu: • Open – displays the list of all available batches so you can select one to open. • Close – closes the current batch. • Suspend – suspends the current batch. • Edit Batch – switches to the edit batch mode. • Create Child Folder – creates a child folder in the batch root.
Chapter 9 • Re-classification – displays the classification result on the validation form, making it possible to reclassify the document. • Viewer – select floating to float the panel as a separate window and to move it anywhere on the screen, or select Left, Right, Top or Bottom to dock the window at top or bottom, or sides of the Validation user interface.
Ascent Xtrata Pro Validation • Previous – navigates to the previous document in the batch. • Next – navigates to the next document in the batch. • Last – navigates to the last document in the batch. • Go To – navigates to the given document number in the batch. The Page menu: • Create Document – creates a document and places all selected pages in it. • Split – splits the current document before the currently selected page. • Reject – rejects the currently selected page.
Chapter 9 The Options menu: • Show Script – shows the script window if script debugging is enabled. • Select Language – allows you to select one of the available languages for the application interface. • Settings – shows the settings dialog box that is used to make user specific settings. The Help menu: • Contents – opens the online Help for the Validation module. • About – shows some information about the Validation module.
Ascent Xtrata Pro Validation Use the Document toolbar to navigate the documents in the current batch. Additionally there is an option to make the current document available for online learning. This capability has to be enabled for the batch class using the Synchronization tool in the Ascent Capture Administration module. Figure 9-5. Document Toolbar Use the Page toolbar to navigate the pages in the current document and to rotate the pages. Note that any existing field data is lost during rotation.
Chapter 9 Validation Form Panel Validation forms for document classes are defined for a project in the Ascent Xtrata Pro Project Builder. The form defined for a project class is used for documents classified to that class. By default, the validation form panel consists of three areas: • The Classification and Extraction Result Fields area displays the classification result and the fields that need to be validated.
Ascent Xtrata Pro Validation Add Row – adds an empty row at the end of the table. Insert Row - inserts an empty row to the table. The rows is inserted above the selected cells or table row. Interpolate rows – tries to identify the rows of the table after you have added the definition for the first row by identifying the single cells of the table on the document in the viewer using the mouse. This button is only enabled when exactly one row is added to the table.
Chapter 9 Use this tab to define the behavior when you validate the last field of a document and when you finish validating the batch. • Colors Use this tab to set the display colors for valid and invalid fields. • Miscellaneous Use this tab to set editing and scripting options. Batch Settings Tab Figure 9-11. Settings Dialog Box – Batch Settings Tab Prompt before closing document By default, a message displays after the last field of a document is validated.
Ascent Xtrata Pro Validation Figure 9-12. Save Current Document Message Prompt before closing batch By default, a message displays after the last field of the last document is validated. If you do not want to be prompted, uncheck this option. The batch will then automatically be closed without prompting. Figure 9-13. Close Batch Message Open next batch automatically By default, this option is enabled so that when one batch closes, the next batch automatically opens.
Chapter 9 Figure 9-14. Settings Dialog Box – Colors Tab Miscellaneous Tab This tab includes options for editing and scripting. Figure 9-15.
Ascent Xtrata Pro Validation Editing By default, the word-click pointer is enabled. This pointer allows the validation operator to copy an OCR result from the document in the Document Viewer to a field on the validation form. Click the word-click pointer to insert the extraction result from the Document Viewer to the current field. To append the extraction result to the contents of the selected field, use ”Ctrl + word-click pointer.” Figure 9-16.
Chapter 9 Figure 9-17. Select Folder Class Dialog Box Select a folder class from the list of available classes and click OK. Click Cancel to close the dialog box. Note Use the Ascent Capture Administration module to create, rename or delete folder classes. If no folder class is available the list of available folder classes is empty. Application Language Dialog Box Use this dialog box to change the language of the application’s graphical user interface.
Ascent Xtrata Pro Validation Select the desired language from the list. When you start Project Builder for the first time, the language is determined by the operating system. Buttons OK Click this button to save your settings. Cancel Click this button to close the dialog box and discard any changes you made. Help Click this button to view the online Help topic for this dialog box.
Chapter 9 InPlace Editor Panel Figure 9-18. Floating InPlace Editor Context Menu Right click the InPlace Editor area to show the context menu. The following options are available: • Float InPlace Editor • Dock 3 Bottom 3 Top Current error By default the Current Error area, which shows the error description for an invalid field, is part of the InPlace Editor. However, the InPlace Editor is optional and need not be displayed.
Ascent Xtrata Pro Validation If started manually, Ascent Xtrata Pro Validation can be set to ”Open next batch automatically.” In this case, after finishing the current batch, the next waiting batch is opened automatically. Also, you can set the Ascent Xtrata Pro Validation polling interval so that it will automatically check for batches that are waiting to be validated. For more information about the settings, refer to Batch Settings Tab.
Chapter 9 Figure 9-20. Ascent Xtrata Pro Validation Showing First Invalid Document Validate a Document When you open a batch in validation the first document having invalid fields is displayed and the invalid field is selected. Insert the correct value and click Enter to validate the field. When the field is valid the field’s color turns green and the next invalid field is displayed.
Ascent Xtrata Pro Validation To navigate within the fields, pages or documents select the corresponding items form the menu or click the button from the toolbar. Additionally shortcuts are provided, for example to navigate to the next field use the Tab key or press Shift + tab to navigate to the previous field. For details see Shortcuts on page 582.
Chapter 9 Click Yes to close the valid batch and select another batch to open. Click No to open the batch. If no batches are ready for validation, the following message displays: Figure 9-23. No Waiting Batches Batch Editing During validation, the batch content is displayed as a tree. The batch tree can be used for navigation purposes. It can also be used to edit the batch and to manage the documents, folders, and batch structure.
Ascent Xtrata Pro Validation Figure 9-24. Graphical User Interface for Editing a Batch Page Operations The following operations can be performed on pages: • Create Document – the selected page(s) are used to create a new document. The document is placed in the root of the parent of the selected page(s) at the end of the document list, and the pages are added to the document in selection sequence. • Split – the current document is split into two documents before the selected page.
Chapter 9 Document Operations The following operations can be performed with documents: • Create Folder – a folder is inserted at the same hierarchy level as the selected document and the document is added to it. This option is only available if at least one folder class is available in the batch class. A dialog box is used to select the Ascent Capture folder class to which the new folder is added. For details see Select Folder Class Dialog Box.
Ascent Xtrata Pro Validation Note You cannot multi-select when performing drag and drop operations. Table 9-1. Drag and Drop Functionality for Batches, Folders, Documents, and Pages Dragged Folder Document Page Dropped Batch The folder will be added at the end of the batch’s folder list. The document will be inserted at the end of the batch’s document list. Folder Drop it on a parent folder and the folder gets inserted at the end of the folder list.
Chapter 9 Figure 9-25. Page Properties Dialog Box Show Field Contents in Batch Tree The contents of a field, for example a customer name or customer ID, can be displayed in the batch tree instead of the document or folder class name. This makes it easier to identify specific documents or folders in the batch tree. In order to see the contents of a field in the batch tree, the ’Use As Display Field’ option must be set in the Synchronization tool for the corresponding document or folder class.
Ascent Xtrata Pro Validation Figure 9-26. Mark Document for Online Learning Dialog Box Document Send Back To Administration For Optimization Of Select Classification and/or Extraction and add a comment if desired. Character Level Editing Character level editing is only possible in fields that have been extracted by the Advanced Zone Locator. Shortcut Keys Many buttons have shortcut keys, ranging from F6 to F12, including all combinations of CTRL and SHIFT.
Chapter 9 Read-Only Fields Fields in the validation form may be set to read-only. This option is defined when designing the validation form. Force Valid Field You can manually force the status of an invalid field to “valid” by pressing Ctrl+Enter (from the keyboard). The field is given a valid status, but it is marked with the forced validation symbol. Figure 9-27.
Ascent Xtrata Pro Validation Figure 9-28. Assign Document Class to an unclassified document If you reassign a class to a document that has already been classified, then the validation form for the selected class may display extraction results from the original class using the new validation form, but only where the field names are identical. Where the field names are not the same, you must manually provide values for the other fields using either the keyboard or the Word Pointer.
Chapter 9 The InPlace Editor is automatically activated by clicking in it. If the Tab key is pressed, the InPlace Editor updateswith the value (if any) in the next field. Focus remains in the InPlace Editor. Note that a field is not validated when you use the Tab key. You must use the Enter key to validate a field. Also, when clicking or lassoing words in the image viewer, focus is returned to the InPlace Editor if it was initially there. For details see InPlace Editor Panel.
Ascent Xtrata Pro Validation Table Indexing There are two different ways of table indexing manually. • Typing – if a table was not extracted completely, you can add missing cells during validation by clicking the cell in the table and typing the text. If you proceed like this, no geometrical information about the table can be gathered and only the values will be available if you mark this document for online learning.
Chapter 9 Additionally the Security Boost user needs full permission for the "..\Ascent\Local\Logs" folder. Shortcuts The following table lists the shortcut keys that can be used in the Ascent Xtrata Pro Validation module. Table 9-2. Key Shortcuts for Ascent Xtrata Pro Validation 582 Keystrokes Description Ctrl + o Open batch. Ctrl + s Suspend batch. Ctrl + z Undo. Ctrl + y Redo. Ctrl + p Next document. Ctrl + Shift + f Last document. Ctrl + g Go to document. Ctrl + r Reject document.
Ascent Xtrata Pro Validation Word-click Pointer Insert selected extraction result from the document viewer to the current validation form field. Ctrl + Word-click Pointer Concatenates the currently selected extraction result from the document viewer with any existing value in the current field.
Chapter 9 584 Ascent Xtrata Pro User's Guide
Chapter 10 Statistics Viewer Introduction Document classification and extraction is a process that is not deterministically constant but deals with varying input. Therefore, the results of this process also depend on the input data, and, by definition, are not predetermined. The quality of the process is defined by how accurately the document class is assigned and items on the document are recognized. This quality is measured by two values called Recall and Precision.
Chapter 10 • Recognition time • Original recognition value • Value changed via manual correction By comparing the original value with the final value, it can be determined if a field was initially correct, If the value has been changed, the initial value is presumed to have been wrong. During runtime, statistical data is gathered for each document and stored in the document (XDocument).
Statistics Viewer Figure 10-1. Ascent Xtrata Pro Statistics Viewer Elements The Ascent Xtrata Pro Statistics Viewer has a main menu and a toolbar for quick access. Below these the Ascent Xtrata Pro Statistics Viewer has a style similar to that of Microsoft Outlook, and is divided into two main panes. There is a navigation pane to the left and a report pane to the right. At the bottom, a status bar shows additional information.
Chapter 10 Toolbars The toolbar provides shortcuts to menu items, and gives you quick access to all important features. Figure 10-3. Main Toolbar Navigation Pane With the navigation pane you can choose from among a variety of actual and historical reports. Figure 10-4. Navigation panel Reports The main part of the window displays the statistical reports.
Statistics Viewer Figure 10-5. Report Pane Status Bar The status bar provides information about the current statistics folder. Figure 10-6. Report pane Reports The available reports are grouped into Actual and History Reports. The actual reports make use of detailed, relatively current, data. Because of the great volume of data generated by the system, the database records are combined into periodic summaries after they have aged to a certain point (this time can be configured in the release script).
Chapter 10 Daily Statistics The daily statistics report contains an overview of the recognition accuracy, the document volume, and the number of pages for the selected day. The day can be selected from a combo box containing all the available days in the database. If there are no data available, only today is displayed. The document volume is calculated by aggregating the values for the selected day and the number of days in “Number of previous days.” The results are displayed as a table.
Statistics Viewer • Incorrect (as percentage of Field Count) • Field Count For details concerning the selection of the field refer to Selection of Field. Recognition Timing By Batch Displays the OCR recognition time per page and the extraction time per document in seconds, to the nearest tenth of a second.
Chapter 10 Historical Reports The following reports are available for the aggregated statistical data., Recognition Accuracy By Field Displays the field recognition accuracy. The recognition accuracy is given as the percentage of correct, incorrect, and rejected occurrences of the field. The report can be restricted to a certain field or can display all fields.
Statistics Viewer Document Recognition Timing Per Day Displays the average recognition speed of the documents grouped by day. The average OCR time per page and the average extraction time per document are displayed in seconds. Additionally the time span can be set by selecting a date range (to the nearest day). The report contains the following information: • Batch Class Name (grouped in report) • Date • OCR Time (avg s/Page) • Extract.
Chapter 10 The report contains the following information: • Batch Class Name (grouped in report) • Date (grouped in report) • Correct(percent of correct fields) • Rejected (percent of rejected fields) • Incorrect (percent of incorrect fields) • Total Field Count (sum of correct, incorrect, and rejected fields) • Doc Count • Page Count For details concerning the selection of the field refer to Selection of Time Period.
Statistics Viewer Additionally the time span can be reduced by setting a date range (to the nearest day).
Chapter 10 Figure 10-8. Selection of time period Selection of Month and Year You can specify a date range (to the nearest month) for the reports by selecting the From and To options and specifying the dates in the Select Time Period dialog box. After the dates are changed, click OK to refresh the report with the new data selection. To specify both a starting and ending month, select both ”From” and ”To.” To include all the data up to and including a specific month, select only ”To.
Index A ACIS Support 544 Activation Code 14 Adaptive Feature Classifier 53 optimization tool 91 overview 22, 84 properties 86 set up 44, 84 Add Classification View dialog box 289 Adding classes 54 Adding fields 108 Adding table fields 109 Adding the Server to a batch class 518 Adding Validation to a batch class 538 Address Evaluator Add 141 Address Evaluator Properties dialog box 413 Address Locator overview 119 Administration Test and optimize a project 42 Advanced Zone Locator overview 119 Advanced Zone L
Index properties 147 Batch class considerations External Server 58, 299 extraction field type 536 importing/exporting 521 overview 519 publishing 520 Recognition Server 519 synchronizing with projects 519 Validation module 216 Batch classes adding modules to 518 considerations 519 publishing 520 synchronizing project with 521 C Class Based Precision and Recall dialog box 295 Class Properties 56 Document separation 60 Extract this class with external server 58 OCR 62 Reclassification 59 Subtree classificat
Index Ascent Xtrata Pro Validation module 7 overview 1 Synchronization tool 522 Validation 553 Content classifiers See Adaptive Feature Classifier and Instruction Classifier Creating invoice projects 36 Creating projects from directory 24 manually 24, 27 D Database Evaluator Add 152 overview 119, 151 Database Evaluator Properties dialog box 436 Database Locator adding 155 adding databases 153 example use cases 156 overview 119, 153 speed considerations 159 Database Locator Properties dialog box 438 Date F
Index Document Viewer drawing regions 127 quick tour 286 testing classification 102 validation form 230, 237 DoubleValue property 112 E Exporting batch classes 521 Exporting locators 122 Extraction confidence 110 design mode 269 dictionaries 167 evaluators 10, 118 fields 107, 108 formatters 112 locators 9, 107, 118 online learning 10 overview 9, 22, 107 regions 126 set up 46, 107 with External Server 58, 299 with Recognition Server 518, 519 Extraction Design panel overview 269 Extraction Results panel qui
Index Instruction Classifier 53 overview 22, 96 set up 45, 97 with Adaptive Feature Classifier 101 Instruction Properties dialog box 328 Invoice Group Locator overview 120 Invoice Group Locator 132 Invoice Header Locator overview 119 Invoice Header Locator adding 177 Invoice Header Locator Properties dialog box 457 Invoice Validation dialog box 376 K Kadmos OCR settings Siehe Kadmos User Guide documentation Keywords 165, 172 Knowledge Base Activation Code 14 overview 13 Protection 14 Knowledge Base Admini
Index integration 10 XDoc file (xdc) 10 OCR Profiles 62 OCR Substitution 138 OCR Voting Evaluator overview 120 OCR Voting Evaluator Properties dialog box 472 OCR VotingEvaluator overview 183 Open Test Folder dialog box 333 Optimize a project 35, 91 Optimize classification forms 79, 385 invoices 79, 384 Order Group Locator 132 overview 120 P Parent represents competing children rule 67 Percentage Formatter DoubleValue property 112 overview 112, 219 Percentage Formatting dialog box 310 Polling interval 550
Index Script Formatter DateValue property 112, 113 DoubleValue property 113 overview 112, 219 set up 114 Script Formatting dialog box 312 Script Locator adding 185, 187 overview 120, 187 properties 185, 187 Script Locator Properties dialog box 479 Scripts integration 11 locator 120 Sax Basic script editor 138 validation events 232 Security Boost Permissions 581 Server module log file 550 overview 1, 543 polling for batches 550 processing batches 543 quick tour 548 Set up Validation 215 Single child wins ov
Index U User Interface 251 V Validate Project 28 Validation field formatters 218 field properties 218 forms 12, 225 methods 12, 219 overview 11, 23, 215 processing sequence 217 rules 12, 221, 273 sample project 244 script events 232 sequence 225 set up 48, 215 Test Validation Rules 230 testing 230 Validation Design General Dialog Boxes 242 Validation Design Panel defining tab sequence 242 font settings 243 form elements 236 user interface 234 Validation forms defining script events 227, 232 overview 12, 2