INDICIUS 6.0 for KOFAX Capture Getting Started Guide (Classification and Separation) 10300776-000 Rev 1.
© 2006-2008 Kofax, Inc., 16245 Laguna Canyon Road, Irvine, California 92618, U.S.A. All rights reserved. Portions, copyright 1997-2006 Kofax Development UK Ltd. All Rights Reserved. Use is subject to license terms. Third-party software is copyrighted and licensed from Kofax’s suppliers. This product is protected by U.S. Patent No. 5,159,667. THIS SOFTWARE CONTAINS CONFIDENTIAL INFORMATION AND TRADE SECRETS OF KOFAX, INC.
Contents How to Use This Guide ..........................................................................................................vii Introduction .......................................................................................................................... vii Related Documentation ...................................................................................................... viii Installation Guide (.pdf) ..........................................................................
Standard Installation ....................................................................................................... 6 Licensing .................................................................................................................................. 6 3 Processing ......................................................................................................................... 7 Introduction ..........................................................................................
Step 3: Initial Analysis ........................................................................................... 49 Step 4: Select Sample Documents for Configuration ......................................... 51 Step 5: Read Page Content .................................................................................... 54 Step 6: Cleanup Documents .................................................................................. 55 Step 7: Select Documents for Testing...................................
vi Getting Started Guide (Classification and Separation)
How to Use This Guide Introduction This guide introduces INDICIUS and describes how it is used to automatically separate pages into documents and to classify documents. It starts with brief installation instructions which are followed by a tutorial. The tutorial will guide you through processing batches using the pre-installed Mortgage Applications example.
Related Documentation The following documentation is included with INDICIUS. Each PDF guide can be opened by clicking Start on the taskbar to display the menu, and selecting All Programs | INDICIUS | Documentation. The INDICIUS Help can be opened from the same menu, but can also be opened from the Help menu within the tools. Pressing F1 within Definer and Script Editor will open the topic for the feature being used. Installation Guide (.
Getting Started Guides These guides are written for people who need an introduction to INDICIUS. The guides are useful as a starting point for those who will be configuring or administering INDICIUS, or those using the keying modules. The guides are self contained, however each focuses on configuring a different document processing solution. Getting Started (Fixed-Form) (.pdf) The Getting Started Guide (Fixed-Form) (.
INDICIUS Help The INDICIUS Help is written for those configuring a solution and for system administrators, and assumes those reading it have read the Getting Started Guides or attended an INDICIUS training course. This assumption is made so that the INDICIUS Help can provide the most accurate and detailed information across every aspect of the product. The INDICIUS Help explains: How to configure the INDICIUS modules to process a document set.
Chapter 1 Overview Introduction This chapter introduces some of the concepts of data capture and key points of INDICIUS. What Does INDICIUS Add to Kofax Capture? INDICIUS is a set of modules that provide additional automatic recognition (classification, separation and extraction) as well as advanced keying (indexing and validation) functionality to Kofax Capture. Kofax Capture scans paper-based documents, creating a series of scanned image files.
Chapter 1 Classification Methods Classification can be done using one or more of the following methods: Image Classification: Classification based on the overall layout and structure of a page, including lines, boxes, logos and placement of text. Text Classification: Classification based on detailed analysis of the text content of a page or document. Rules-Based Classification: Classification performed by searching for specific data or keywords, independent of layout.
Overview Page Classification and Separation (resulting in document classification) If extraction is also being done as part of the queue, an additional instance of the Recognition module named INDICIUS Recognition (Classification and Separation) is used for classification and separation. This leaves the standard instance of Recognition available for extraction.
Chapter 1 Low Volume, Single Station Environment In lower volume environments it is possible to run batches through all the modules on a single station, using Kofax Capture Batch Manager.Configuring a Classification and Separation Solution Configuration (that is, setting up the INDICIUS modules to process particular documents) is a two step process: Configure the INDICIUS modules using the INDICIUS configuration tools and a set of sample documents.
Chapter 2 Installing INDICIUS Introduction This chapter provides instructions for installing INDICIUS using the installation wizard (standard installation). To install INDICIUS the following items are required: 1 A computer satisfying the system requirements as described in the Installation Guide (.pdf). 2 An INDICIUS installation CD. 3 A Kofax Capture license hardware key with INDICIUS features enabled. Note Kofax Capture must be pre-installed and licensed.
Chapter 2 Installing INDICIUS for the First Time Standard Installation X To install INDICIUS 1 Place the INDICIUS installation CD into the CD-ROM drive. The main installation screen will display. 2 Select INDICIUS and follow the on-screen instructions. 3 To install Document Review, select INDICIUS Document Review and follow the on-screen instructions. 4 To install Transformation Studio, select Transformation Studio and follow the on-screen instructions.
Chapter 3 Processing Introduction This chapter will introduce you to the INDICIUS modules as they are used in production. You will use a pre-configured example solution to experience how the modules run. The Mortgage Applications Example The INDICIUS installation includes an example configuration that demonstrates some of the processing features of classification and separation in INDICIUS.
Chapter 3 Setting Up the Classification and Separation Instance of Recognition The following section will describe how to set up the classification and separation instance of Recognition. Registering the Additional Instance The following steps will guide you through registering an additional instance of Recognition to be used for classification and separation.
Processing 2 Right click on “Recognition” shortcut and drag it onto the desktop (copying, not moving the shortcut). 3 Rename the shortcut “Recognition (Classification and Separation).” 4 Right click on the shortcut and select Properties to open the properties window. 5 Select the Shortcut tab. 6 Move to the end of the text in the Target box and add a space after the current text.
Chapter 3 Installing the Example Batch Classes Installation X To install the example batch class 1 Start Administration by clicking Start on the taskbar to display the menu, and selecting: All Programs | Kofax Capture 8.0 | Administration. 2 Select File | Import to display a file selection window. 3 Select the following batch class: \examples\Mortgage Applications\Mortgage Apps.cab. 4 Click Open to unpack the batch class.
Processing The Publish window will display. 11 Press and hold Ctrl and click to select the following batch classes: Mortgage Apps Mortgage Apps with Separation 12 Click Publish. The progress of the publishing operation will be logged in the Results panel. Note It is normal for a warning to be generated when the batch class is published.
Chapter 3 9 Mortgage Apps Mortgage Apps with Separation Click Publish. The progress of the publishing operation will be logged in the Results panel. 10 When publishing has been completed, click Close.
Processing Viewing the Modules X To view the modules included in the example batch classes 1 On the Batch panel, select the “Mortgage Apps” batch class. 2 Right click on the selection to display the menu, and select Properties. 3 The Batch Class Properties window is displayed. 4 Select the Queues tab. The modules included in the batch class are displayed in the Selected Queues list. 5 Click OK. 6 Optionally, repeat the previous steps for the “Mortgage Apps with Separation” batch class.
Chapter 3 Document Classification In the document classification example, INDICIUS Recognition (Classification and Separation) and INDICIUS Document Review are used to classify mortgage applications. Document boundaries are established prior to INDICIUS Recognition (Classification and Separation) using patch code separators in Kofax Capture Scan.
Processing Figure 3-1. Create Batch Window 4 Enter a name for the new batch in the “Name:” box, for example “Mortgage Applications 1”. 5 Click Save. 6 Click Close. Your batch is displayed in the list. The Queue column indicates that the batch is ready to be processed by Kofax Capture Scan.
Chapter 3 \examples\Mortgage Applications\Images\Document Images 4 Click Open to import the images and establish document boundaries. Kofax Capture Scan detects the patch code separators placed between each document and uses this to create the full batch structure (shown in the tree view). 5 Select Batch | Close and click Yes to the message box.
Processing Figure 3-2. The Batch Loaded in Document Review 3 Select Document Classification | Override Problem to ignore the problem for now. A message will display stating that there are no more problem documents to display in Document Classification view so the batch will now be shown in Review. Note You can also press F7 to override a problem.
Chapter 3 Figure 3-3. Transition from Document Classification view to Review 4 Click OK on the message. The batch will open in Review, with the overridden document displayed. It has failed a validation rule (shown in the yellow message); a problem which must be fixed before the batch can be closed.
Processing Figure 3-4. The Batch in Review In Review, you can see all the documents in the batch. Although the best classification result is “Appraisal Report,” the document is poor quality and appears to be upside down. 5 Press F3 to rotate the image by 180˚. You can see that this is a very poorly scanned Truth In Lending document. The next document in the batch is also a Truth In Lending. 6 Click the + buttons to the left of the two documents to expand them and display the thumbnails.
Chapter 3 Figure 3-5. The Two Documents with Thumbnail Images 7 Compare the two documents. They appear to be the same document (they have the same loan number in the top left). The first of the two documents can therefore be deleted.
Processing Figure 3-6. Deleting a Document 9 Select Delete and click Yes. As there are no further problems in the batch you will be prompted to close the batch. 10 Click Yes. In Batch Manager, the Queue column indicates that the batch is ready to be processed by INDICIUS Recognition. Conditionally Extract Data Having classified the documents and reviewed the results of the classification in Document Review, extraction can now take place.
Chapter 3 Recognition will automatically begin processing the batch. Information messages will be displayed and the “Docs Processed” should increment. When “Docs Processed” reaches 11, Recognition will close. In Batch Manager, the Queue column indicates that the batch is ready to be processed by INDICIUS Completion. Review the Extraction Results Use INDICIUS Completion to review the fields extracted for each document. X To review the data 1 Click Process Batch on the toolbar in Batch Manager.
Processing 3 Press F12 to move to the next document. Again, no data has been extracted for the “Tax Escrow” document type. 4 Press F12 to move to the next document. 5 Use Tab/Shift+Tab to navigate around the fields on Document 3. Figure 3-8. Completion Window displaying Data Extracted for Loan Application Documents These fields have been specifically extracted for the “Loan Application” document type.
Chapter 3 7 Click Exit Completion. Release the Documents Kofax Capture Release runs after the last INDICIUS module in the queue. The data stored in index fields within the Kofax Capture document classes (each of which corresponds to an INDICIUS document type) is copied to the destination as configured in the release script. In a production system this would be a back-end system. In this example, a text file is output containing the data from the documents.
Processing 2 Make sure the “Mortgage Apps” batch class is selected. 3 Enter a name for the new batch in the “Name:” box, for example “Mortgage Applications 2”. 4 Click Scan to display an Import window. 5 Select the images in the following location. \examples\Mortgage Applications\Images\Document Images. 6 Click Open to import the images and establish document boundaries.
Chapter 3 Note Rather than selecting a single batch in Recognition, the module would normally be started in Wait for any Batch mode to automatically process batches as they become available. Alternatively, Recognition would be installed as a Windows service and would process batches automatically. Review the Classification Results Use INDICIUS Document Review to confirm any document types that Recognition is unsure about or that fail validation rules.
Processing All Programs | INDICIUS | Recognition. 2 Select Session | Select Batch. 3 Select the batch created in Scan from the list. 4 Click OK. Recognition will begin processing the batch. Information messages will be displayed and the “Docs Processed” should increment. When “Docs Processed” reaches 11, the status bar will display “Idle.” 5 Select Session | Exit to close Recognition. Review the Extraction Results Use INDICIUS Completion to review the fields extracted for each document.
Chapter 3 X To review the data 1 Open Kofax Capture Release by clicking Start on the taskbar to display the menu, and selecting: All Programs | Kofax Capture 8.0 | Release. Kofax Capture Release will automatically begin processing the batch. Information messages will be displayed and the progress is displayed in the “Current Batch Progress” panel. When the text “Document 11 of 11” is displayed, Kofax Capture Release will stop processing. 2 28 Select Batch | Exit to close Kofax Capture Release.
Processing Page Classification and Separation In the page classification and separation solution, INDICIUS Recognition (Classification and Separation) and INDICIUS Document Review are used to classify and separate documents. Document boundaries are established from the classification of pages in INDICIUS Recognition (Classification and Separation) and used by the later INDICIUS modules.
Chapter 3 Figure 3-9. Create Batch Window 4 Enter a name for the new batch in the “Name:” box, for example “Mortgage Applications 3”. 5 Click Save. 6 Click Close. Your batch is displayed in the list. The Queue column indicates that the batch is ready to be processed by Kofax Capture Scan. Import Images X To import images 1 Make sure the name of the new batch is highlighted and select File | Process Batch or click Process Batch( ) on the toolbar. The batch is opened in Kofax Capture Scan.
Processing 4 Click Open to import the images. Kofax Capture Scan imports all the pages and places them in a single temporary document. This document will be replaced later by INDICIUS, after page classification and separation has run. 5 Select Batch | Close and click Yes to the message box. In Batch Manager, the Queue column indicates that the batch is ready to be processed by the Classification and Separation instance of Recognition.
Chapter 3 Figure 3-10. Problem in the Document Classification View You can see that the document is, however, correctly classified. A simple confirmation is required. 3 Press Enter to confirm the document type. 4 Click OK on the message to open the Review view. 5 Expand the documents to check the automatic separation has been successful. 6 Select Session | Close Batch and click Yes to close the batch.
Processing X To extract the data, click Process Batch on the toolbar in Batch Manager. Recognition will automatically begin processing the batch. Information messages will be displayed and the “Docs Processed” should increment. When “Docs Processed” reaches 14, Recognition will close. In Batch Manager, the Queue column indicates that the batch is ready to be processed by INDICIUS Completion. Review the Extraction Results Use INDICIUS Completion to review the fields extracted for each document.
Chapter 3 “Current Batch Progress” panel. When the text “Document 14 of 14” is displayed, Kofax Capture Release will close. In Batch Manager, you can see that the batch has been deleted.
Chapter 4 Configuration Overview Introduction To create an INDICIUS solution, you first need to configure the INDICIUS modules using the INDICIUS configuration tools and a set of sample documents. Once you have created and tested this configuration, you need to assign it to a batch class in Kofax Capture Administration. In these tutorials you will replicate the classification and separation elements of the Mortgage Applications example configuration.
Chapter 4 Representative Documents It is important that the sample documents are scanned using the production scanner and represent the variations that are seen in production, for example faxes and photocopies. If extraction (indexing) is being implemented as well as classification and separation, it is recommended that the documents are scanned at 300 dpi. Document Set Management Steps The following steps are used to create two accurate document sets, which are then used to configure and test a solution.
Configuration The advanced document separator is created automatically from the document types assigned to a set of sample documents and, when run in production, takes into account the confidence of the page classification results. Rules-based separation is manually defined using a set of separation rules. Transformation Studio is used to create the learn-by-example classifiers and the advanced document separator.
Chapter 4 Figure 4-11. Recognition Configuration Steps (Document Classification) Page Classification and Separation Configuration Steps The following steps are used to create and test a Recognition page classification and separation configuration using the two accurate document sets. Step 1: Create Configuration: Create a default configuration from the Page Classification and Separation template.
Configuration Figure 4-12. Recognition Configuration Steps (Page Classification and Separation) Document Review The Document Review module is configured using a Document Review project file. This project file contains reasons for displaying documents, validation rules and window options (for example shortcut keys and text labels). The Document Review project file is configured using the Document Review Project Editor.
Chapter 4 Page Classification and Separation Configuration Steps The page classification and separation tutorial will use the Document Review configuration you created for the document classification tutorial. Integrate the Configuration with Kofax Capture Having built the configuration, a batch class must be created in Kofax Capture Administration. The configuration can then be assigned to the batch class and a batch can be processed.
Configuration Step 5: Assign Configuration to the Standard Instance of Recognition Step 6: Assign Configuration to Completion Step 7: Configure Kofax Capture Release Step 8: Publish Batch Class Step 9: Process Batch Getting Started Guide (Classification and Separation) 41
Chapter 4 Document Classification Tutorial Document Set Management Step 1: Create Project When using Transformation Studio, you will work in a project. Within this project you can import and organize your sample documents and create one or more configurations. X To create a project 1 Open Transformation Studio by clicking Start on the taskbar to display the menu, and selecting All Programs | INDICIUS | Tools | Transformation Studio. 2 Click New to open the New Project window.
Configuration Figure 4-13. Transformation Studio after Create New Project 1 Project Explorer showing current document sets and configurations 2 Document Types panel displaying the document types in the current document set 3 Status bar showing the current state of Transformation Studio 4 Tab area, currently showing the Import Documents tab Step 2: Import Documents Transformation Studio includes a wizard for importing documents into the current project.
Chapter 4 The import documents wizard is launched automatically when a new project is created. Note To launch the Import Documents Wizard manually, select File | Import Documents, press CTRL+SHIFT+I or click . The Example The example mortgage applications have been exported from an archive system, and have the following folder architecture: Each document type is in a folder, named with the document type. Each document is in a folder.
Configuration Figure 4-14. Folders to Select for Import c Select all the folders (there is one for each document type). d Click Open.
Chapter 4 Figure 4-15. Folders to be Imported e Click Next to display Step 2. 2 Specify the document structure and values to import. Transformation Studio has already split (parsed) the filenames and paths into values, as displayed using the example at the top of the tab. For this example, there is no need to modify the parsing options. a On the Structure panel, select “Every imported folder is a document.” The preview on the right will update to show how the files will be imported into documents.
Configuration Note If you have not installed to the default location, the number may be different. Ensure you select the last item in the list. c On the Document Properties panel, select “6 – Appraisal Report” from the “Document type” list. This specifies that the sixth value in the filename/path is to be imported as the document type. For the currently selected example document, the value of this sixth property is “Appraisal Report.
Chapter 4 a Select “Copy document files into project folder,” rather than referencing the images in their current location. This will move them into the project folder, making it easier to move your project at a later time and ensuring no dependency on the images remaining in their current location. Note Using this option will slow the import process and require more disk space.
Configuration Assigning Document Types In the example, the imported documents already have document types. If you import documents without document types, you would need to do the following: X To assign document types to completely unclassified documents 1 Read the page content for all documents. 2 Assign document types to 5-10 samples of each type and confirm these documents. 3 Use Auto Classify.
Chapter 4 Figure 4-18. Overview You can see how many document types you have and the distribution of documents across those types. From the Overview you may realize that some types occur rarely and don’t need configuration or that you have more or less document types or documents than you expected. X To review the mortgage applications 1 Review the chart for the number of document types (x-axis) and the number of documents (y-axis).
Configuration 2 Review the Header documents to see whether you should get more examples. a Double-click the Header bar in the chart to open Browse Documents with a filter to show just the Header documents. b Scroll through the documents, looking at the amount of variation between each example. In fact, all of these documents contain barcodes that will be used to classify the documents.
Chapter 4 Test Documents Documents to use when testing the configuration (not used for training). Unused Documents Documents that are not currently being used. These may be additional documents that are not required for configuration or documents that have not yet been classified. Table 4-1 gives guidelines for the number of documents required for the different classification methods (per document type).
Configuration deletions of pages or documents, reordering of pages within documents, or the addition of Confirmed or Extra Page attributes. Note All documents are always present in the All Documents set. Documents can be added to another set from All Documents, but cannot be moved to another set from All Documents. X To select sample documents to use for configuration 1 Select Document Sets | Select Sample Documents or click Select Sample Documents window. to display the Figure 4-19.
Chapter 4 Figure 4-21. Project Explorer after Select Sample Documents Step 5: Read Page Content At this point you need to read (OCR) each page of the documents in your sample set. Using these reads, Transformation Studio can help you analyze the documents with the aim of finding any that are misclassified or poor quality. These reads will also be used when you build text classifiers and configure additional classification methods.
Configuration You only need to read a small section of each page (which will speed up processing time). You want to use the read for extraction as well as classification and need a higher read accuracy. Note For information on setting custom read parameters refer to the INDICIUS Help. X To read the pages in the sample document set 1 Select Tools | Read Page Content. Note As it is the currently open document set, “Sample Documents” will automatically be selected in the Document Set list.
Chapter 4 Step 6.1: Analysis using the Overview tab The Overview tab was first used in Step 3: Initial Analysis and displays statistical information on a document set. Having read the pages in Step 5: Read Page Content, the Overview chart is updated to indicate how clean (accurate) Transformation Studio has analyzed the set to be. Each document type in the chart is color-coded according to the following criteria: Table 4-2.
Configuration Cleaning up Extra Pages Cleaning up Document Types Transformation Studio analyzes the document set and identifies pages it suspects are extra. These may be blank pages, fax cover sheets, pages with text that isn't found on other documents in the type or pages that cannot be read properly. Cleaning up Extra Pages: Within this step you will confirm whether or not each of the marked pages (those suspected as being extra) are extra pages.
Chapter 4 3 Cleanup the documents that are displayed by following the on screen instructions and answering the questions. This step will vary depending on which documents were randomly selected by Transformation Studio as Sample Documents. However, the same process is always used: Suspected extra pages are displayed for each document type in the set. Documents needing their type confirmed are displayed for each document type in the set. Additional suspected extra pages are displayed.
Configuration Figure 4-22. Confirming Document Types Only documents that will significantly affect the confidence of the documents in the set will be displayed. These documents are continually reassessed as you confirm or remove document types. The documents are color-coded as described in Table 4-3. Table 4-3.
Chapter 4 Table 4-3. Color Coding of Documents Color Confidence State Description Blue Confirmed The document type has been manually confirmed. a Look at the currently displayed document using the thumbnails and Image Viewer and decide whether it has the correct document type. Note The message above the document (and the color coding in the title bar) indicates how confident Transformation Studio is about the document.
Configuration Loan Application Some of the loan applications have lots of pages. To wrap the pages so they all display in the thumbnail viewer without scrolling, click the Wrap Pages button above the thumbnail viewer. Note Documents without a type become “Unknown” and can be automatically classified later. When configuring a real solution, there are alternatives to having to click “No” to this many documents during Cleanup.
Chapter 4 Figure 4-23. Cleaning Up Suspected Extra Pages a Look at the currently marked page using the Thumbnail Viewer and the Image Viewer and decide whether or not it is an extra page. b Confirm or clear the extra page mark: Click Yes (or press ENTER or Y) to confirm the page is extra to the document and will not occur in production. Click No (or press N) to clear the suspected extra page.
Configuration Each document type in the graph (except Unknown Documents) will be green as it will contain only confirmed and confident documents. 5 Review the number of documents in each type. You need at least 100 documents of each type (except the Header) in order to create a configuration. You will not have enough documents in the Tax Escrow (as the Initial Escrow documents were mixed into this type but were reclassified as Unknown during cleanup). Step 6.
Chapter 4 The document will now have the type “Initial Escrow” assigned. h Find the next Initial Escrow document. i Right click on the document and select “Change Document Type” from the context menu. j Select “Initial Escrow” from the list of document types. k Click OK. l Assign the “Initial Escrow” type to three more documents, including a two page Initial Escrow.
Configuration a Double-click the Unused Documents set on the Project Explorer panel. The Overview chart and the Document Types panel will be updated to show the composition of this set. There are documents in the Tax Escrow type that are not currently being used. b Right click on Tax Escrow in the Document Types panel to display the context menu. c Select “Move documents to another document set” to open the Move Documents to Document Set window. d From the Move documents to list, select Sample Documents.
Chapter 4 9 Review the chart. There should now be at least 100 documents for each document type (except for Header) and each bar should be green. Note It is possible to review the documents that have been automatically classified, using Browse Documents. For more information refer to the INDICIUS Help. Step 7: Select Documents for Testing The Test Documents set is used to store a subset of the clean documents for use in testing.
Configuration X To select documents for testing 1 Select Document Sets | Select Test Documents or click Select Test Documents window. to display the Figure 4-24. Select Test Documents window 2 Read the warning message; optionally click Show Warnings to see more details. 3 As the percentage of documents to move is already at 30%, click OK to move the documents. Once the documents have been successfully moved, a message will display. Figure 4-25.
Chapter 4 Figure 4-26. Project Explorer after Selecting Test Documents Note If you look at the quality of the Test Documents set in isolation, you will see that it appears to be of lower quality than the Sample Documents set before and after the split. This is because Confirmed documents contribute most to the information used to judge the quality of a document set, and none of these were moved to Test Documents.
Configuration Extraction (assigned to the standard instance). A Recognition configuration is always based on a configuration template. A template is a set of resources that form the foundation of your configuration. Note The resources created will vary depending on the type of template selected. X To create a document classification configuration 1 Select Configuration | Create Configuration... to display the New Configuration window. 2 Select “Document Classification.
Chapter 4 Figure 4-28. Project Explorer showing Configuration Resources Step 2: Configure Text Classification Document Text Classifier The classifier is created using the Build Document Text Classifier tab. Typically the text classifier is trained on the documents in the Sample Documents set (after it has been cleaned during document set management). Training options are selected before the build process is started.
Configuration 2 Within the table, clear the “Include” check box for the Header document type, so these documents are not used in training the classifier. This document type will be accounted for later by configuring templated (barcode) classification. 3 Click Build. 4 Once the classifier has been built, click Finish. Integrate Classifier In production, Recognition runs a Recognition script, which uses the classifier. The Recognition script (named Document Classification.
Chapter 4 X To test the configuration 1 Export the Test Documents set from Transformation Studio. a Select File | Export Documents to display the Export Documents tab. b Select Test Documents from the “Document Set” list. c Click Browse and navigate to the following location: My Documents\Transformation Studio Projects\Tutorial. d Create a new folder called “Exported Document Sets”. e In the new folder, create a new subfolder called “Test Documents (Document Classification).
Configuration h Select the batch file you exported from Transformation Studio: My Documents\Transformation Studio Projects\Tutorial\Exported Document Sets\Test Documents (Document Classification)\All Document Types.ibf. i Click Open. j Press F8 or click the Run Test button to test the configuration. The batch file will not be altered during this process. k Select the Summary tab to view the Test Documents set with document types assigned.
Chapter 4 Export Documents for Use in Other Tools Templated and rules-based classification are configured in Definer. As with text classification, the configuration is based on the Sample Documents. In order to use these sample documents easily in Definer, they need to be exported from Transformation Studio. You can export the whole Sample Documents set or, by creating additional custom sets, just the samples for the document types that you need to use in Definer.
Configuration 7 Select “Sample Header Documents” from the “Document Set” list. 8 Click Browse and navigate to the following folder: My Documents\Transformation Studio Projects\Tutorial\Exported Document Sets. 9 Create a new folder called “Sample Header Documents”. 10 Select the folder Sample Header Documents and click Open. 11 Make sure the “Create one image file for each document” option is selected. 12 Make sure the “Export text files” and “Export recognition output files” options are clear.
Chapter 4 Figure 4-30. Barcode Field 7 On the Properties panel on the right, select the Name property and replace the default value by entering “Barcode” for the field name. 8 Select File | Save Definition to open the Save As window. 9 Navigate to the location of your Recognition configuration: My Documents\Transformation Studio Projects\Tutorial\Configurations\Document Classification\Resources. 10 Enter “Header” as the file name.
Configuration 20 Check the field shows the message “Barcode found at ” and that the data matches the value above the barcode. 21 Click Run Step to test the next document. 22 Repeat the last two steps until all images have been tested. 23 Click Close to exit the Test Mode window. The barcode should have been found on every document. If needed, resize the field and retest until all the barcodes are found. 24 In the main Definer view, select the Definition File tab below the image.
Chapter 4 Integrate Definition file As with the text classifier, the definition file is called by the Recognition script in production. The script will not call a definition file by default, but this can easily be modified. The name of the definition file must also be updated. X To integrate the definition file into the script 1 In Windows Explorer, navigate to your configuration’s resources folder: My Documents\Transformation Studio Projects\Tutorial\Configurations\Document Classification\Resources.
Configuration 3 Click Run Test. 4 Once the test has finished, select the Header tab. 5 Select one of the documents and click Script Messages in the bottom left panel. If templated classification has run, the message will read “Document classified by template as Header.” Do not close Recognition Test Tool. Step 4: Test Performance You have already tested the configuration when you added each classification method.
Chapter 4 5 Select File | Exit to close the Results Analysis window. 6 Select File | Exit to close Recognition Test Tool. Create Document Review Configuration Step 1: Configure a Document Review Project File In this step you will create and configure a Document Review project file using Document Review Project Editor. In a document classification solution, Document Review is used to ensure document types are correctly assigned.
Configuration Redemption Request for Tax Form Tax Escrow Truth In Lending Note It is important that the spelling and case of the document types is exactly as written here, so that the types match those assigned in Transformation Studio. You will now specify a validation rule that states that all documents in the batch must have a type specified in the list you just created. If a document fails this rule, a problem will display in the Document Review module. 13 Select the Validation tab.
Chapter 4 Step 1: Create Batch Class X To create a batch class 1 Start Kofax Capture Administration by clicking Start on the taskbar to display the menu, and selecting: All Programs | Kofax Capture 8.0 | Administration. 2 Select File | New | Batch Class. 3 In the “Name:” box, enter “My Mortgage Apps”. 4 Select the Queues tab.
Configuration 5 Request for Tax Form Tax Escrow Truth In Lending Unknown Click OK. The document classes from the installed Mortgage Applications example, along with their folder classes, are inserted into the batch class you just created. Note Batch classes should always contain an “Unknown” document class and form type. These account for any documents which could not be classified, since every INDICIUS document type must correspond to a Kofax Capture form type.
Chapter 4 Step 4: Assign Configuration to Document Review X To assign the configuration for Document Review 1 On the Batch panel, select the “My Mortgage Apps” batch class. 2 Right click on the selection to display the menu, and select INDICIUS Document Review Setup. The Document Review setup dialog is displayed. The Document Review project file is specified (and can be changed) here. 3 Click Select... to display a file selection window.
Configuration 7 Click OK. Step 6: Assign Configuration to Completion X To assign the configuration for Completion 1 On the Batch panel, select the “My Mortgage Apps” batch class. 2 Right click on the selection to display the menu, and select INDICIUS Completion Setup. The setup dialog for Completion is displayed. The Completion configuration files are specified (and can be changed) here. 3 Click Add Template.
Chapter 4 Kofax Capture comes with pre-installed release scripts, which control the method and final location of the data you have captured. We will use the “Kofax Capture Text” release script to release the data to a text file. In production, the data would be released to a database or back-end system. 4 From the “Available Release Scripts:” list, select “Kofax Capture Text.” 5 Click Add. The Text Release Setup window is displayed.
Configuration Step 9: Process Batch Create a new batch using Kofax Capture Batch Manager and then process the images through the modules (if necessary refer to the Processing chapter for instructions).
Chapter 4 Page Classification and Separation Tutorial Summary In this section, you will modify the current solution to use automatic document separation. Automatic document separation can save significant time and cost by removing the need for patch code separators.
Configuration 5 Click Add. The configuration will be added to the Configurations list on the Project Explorer panel. Step 2: Configure Text Classification Build Page Text Classifier The classifier is created on the Build Page Text Classifier tab, where training options are selected before the build process is started. Typically the text classifier is trained on the documents in the Sample Documents set (after it has been cleaned during document set management).
Chapter 4 Figure 4-31. Building the Page Text Classifier 3 Click Build. Once the page text classifier has been built it will display on the Project Explorer panel. Figure 4-32.
Configuration 4 Click Finish. Integrate Classifier As for the document classification solution, Recognition calls a Recognition script which in turn calls the classifier. The Recognition script (called Page Classification and Separation.ifv) is created automatically when the configuration is created. One change may be needed in this script: The name of the classifier The script will, by default, call a classifier named “Page text classifier.mod.
Chapter 4 d Create a new subfolder called “Test Documents (Page Classification and Separation).” e Select the folder Test Documents (Page Classification and Separation) and click Open. f Make sure the “Create one image file for each page” option is selected. g Make sure the “Export recognition output files” option is selected and the “Export text files” option is cleared. h Click Export.
Configuration Note Recognition Test Tool only displays the most confident page types. Page classification returns multiple classification results for each page, sorted in order of confidence. When advanced document separation is performed, it will take all possible results into account. j Save the project as: \Test Projects\Page Classification and Separation.rtp. k Select File | Exit to close Recognition Test Tool.
Chapter 4 Figure 4-33. Building the Advanced Document Separator 4 Click Build. When the separator has been built, it will display on the Project Explorer panel.
Configuration Figure 4-34. Project Explorer Displaying the New Separator 5 Click Finish. Integrate Advanced Document Separator In production, Recognition first calls the script file (which calls the page classification methods) and then runs a separation project file, which in turn references the separator. The separation project file (called Separation.drp) is created automatically when the configuration is created. Two changes may be needed to this project: The name of the separator.
Chapter 4 1 Open Recognition Test Tool by clicking Start on the taskbar to display the menu, and selecting All Programs | INDICIUS | Tools | Recognition Test Tool. 2 Open the project used to test page text classification. Note You can open the project from the recent projects on the File menu. By default it will be: My Documents\Test Projects\Page Classification and Separation.rtp 3 Select File | Project Properties to open the Project Properties window.
Configuration Text classification Image classification Templated classification (including barcodes) Rules-based classification For more information on these methods, refer to Classification Methods or the INDICIUS Help. Definition File for Templated Classification The definition file that was used for classifying the document type in the Document Classification solution can be used again.
Chapter 4 My Documents\Transformation Studio Projects\Tutorial\Configurations\Page Classification and Separation\Resources. 2 Double-click the file “Page Classification and Separation.ifv” to open it in Script Editor.
Configuration 6 Click Script Messages in the bottom left panel. If the first message is “Running classification by template (definition file),” page templated classification has run. Note More detailed analysis of performance will be done in Step 5: Test and Evaluate Performance. 7 Select File | Exit to close Recognition Test Tool. Build Page Image Classifier The classifier is created on the Build Page Image Classifier tab, where training options are selected before the build process is started.
Chapter 4 Figure 4-35. Building the Page Image Classifier Note Some warning triangles may display. Hover the mouse over a specific triangle to display the warning. Two types of warning display in the tutorial: one for too few examples of a page type and the other for too many. In a project, if you have too few examples you need to go back to the customer to ask for more. If you have too many examples, you need to look at the variation within the page type.
Configuration Figure 4-36. Project Explorer Displaying the New Classifier 4 Click Finish. Integrate Classifier The Recognition script now needs updating to call page image classification. The template script will not call a page image classifier by default, so you need to integrate the file by: Specifying that you are using image classification. Specifying the name of your classifier (not required for this tutorial as the classifier you built has the default name).
Chapter 4 5 Select File | Exit to close Script Editor. Test Classification The configuration is again tested on the Test Documents set in Recognition Test Tool. X To test the configuration 1 Open Recognition Test Tool. 2 Open the project used to test page classification and separation. Note You can open the project from the recent projects on the File menu. By default it will be: My Documents\Test Projects\Page Classification and Separation.rtp 3 Click Run Test.
Configuration Testing in Recognition Test Tool Testing has been done throughout the tutorial using the Recognition Test Tool. In this step, the results of the test are analyzed in a little more detail, with the aim of spotting any significant problems before going into the more detailed analysis using BatchCompare. X To analyze the test results (Analyse Results).
Chapter 4 The first, a “reference batch,” is created during export of the Test Documents from Transformation Studio. It contains accurate document types and structure for all the Test Documents. The second, a “comparison batch,” contains the same documents. However, the document types and structure are exported from Recognition Test Tool, after a test has been run using the new configuration.
Configuration In the new window, navigate to the location of the offline batch you imported into Recognition Test Tool (the reference batch): My Documents\Transformation Studio Projects\Tutorial\Exported Document Sets\Test Documents (Page Classification and Separation). b For the File name, enter “All Document Types Automatic Results”. c Click Save. d Select File | Exit to close Recognition Test Tool. 2 Use the BatchCompare utility.
Chapter 4 My Documents\Transformation Studio Projects\Tutorial\Exported Document Sets\Test Documents (Page Classification and Separation)\Unknown Batch Statistics.xls. This workbook contains raw comparison data and macros to generate statistics from the data. For detailed information of the data, refer to the INDICIUS Help. b On the Control Sheet, click Recalculate All. Note Macro security in Excel must be set to medium or low in order to generate statistics.
Configuration a For the Per-Document Results, change the confidence threshold for each document type in the table on the far right. For any document type with a classification rate of less than 95%, change its confidence threshold to 85%. For any document type with a classification rate of less than 60%, change its confidence threshold to 80%. Figure 4-37. Thresholds Set b On the Control Sheet, select “Use user-defined category thresholds.” c Click Recalculate All. Figure 4-38.
Chapter 4 f Double-click the separation project file Separation.drp to open it in Document Review Project Editor. g Select the Document Separation tab. h Click Edit thresholds next to the advanced document separator. i Change the thresholds so they match those determined in Excel. Figure 4-39. Edit Thresholds window in Project Editor j Click Save Thresholds. k Click Close. l Select File | Exit to close Document Review Project Editor.
Configuration Integrate the Configuration with Kofax Capture Once the configuration has been created and tested, it needs to be assigned to a batch class in Kofax Capture Administration. Step 1: Create Batch Class X To create a batch class 1 Start Kofax Capture Administration by clicking Start on the taskbar to display the menu, and selecting: All Programs | Kofax Capture 8.0 | Administration. 2 Select File | New | Batch Class. 3 In the “Name:” box, enter “My Mortgage Apps with Separation”.
Chapter 4 5 Appraisal Report Funding Transmittal Header Initial Escrow Loan Application Redemption Request for Tax Form Tax Escrow Truth In Lending Unknown Click OK. The document classes from the installed Mortgage Applications example, along with their folder classes, are inserted into the batch class you just created. Note Batch classes should always contain an “Unknown” document class and form type.
Configuration 6 On the Document Review Project File panel, click Select... to display a file selection window. 7 Select the following file: My Documents\Transformation Studio Projects\Tutorial\Configurations\Page Classification and Separation\Resources\Separation.drp. 8 Click Open. 9 On the General panel, use the drop down list for the “Processing Level” option to select the value “Page.
Chapter 4 Step 5: Assign Configuration to the Standard Instance of Recognition X To assign the configuration for the standard instance of Recognition 1 On the Batch panel, select the “My Mortgage Apps with Separation” batch class. 2 Right click on the selection to display the menu, and select INDICIUS Recognition Setup. The setup dialog for this instance of Recognition is displayed. The Recognition configuration files are specified (and can be changed) here.
Configuration We will not be configuring Completion in this tutorial, so we will assign the pre-installed configuration files. 5 Select all eight templates in the folder. 6 Click Open. 7 On the left hand panel, select “Input/Output” to display the Input/Output view. 8 For the “Load document type from:” dropdown list, select “System Document Type.” 9 For the “Write data to:” options, select both “File” and “Index Fields.” 10 Select the “Display all documents to user” option.
Chapter 4 10 Click Open. 11 Copy the path in the “File name:” box to the clipboard. 12 On the Text Release Setup window, select the Document Storage tab. 13 On the Document Storage panel, clear the “Release image files” option. 14 Click OK on the Text Release Setup window. 15 Click Close on the Release Scripts window. 16 Repeat the previous steps for the remaining document classes, pasting the path into the “File name:” box.