OmniPage Pro for Macintosh Users Manual CAERE CORPORATION 100 Cooper Court Los Gatos, California 95032-3321 USA
&DHUH *PE+ ,QQHUH :LHQHU Strasse 0QFKHQ *HUPDQ\ &DHUH 8. ,QIRUPDWLRQ &HQWUH $EEH\ +RXVH $EEH\ 2UFKDUG 6WUHHW :HVWPLQVWHU /RQGRQ 6: 3 -- &HQWUH G©LQIRUPDWLRQV &DHUH UXH GHV $UFKLYHV 3DULV )UDQFH Please Note To use this program, you should know how to work in the Macintosh environment. Please refer to your Macintosh documentation if you have questions about how to use menus, dialog boxes, or scroll bars.
Welcome Welcome to OmniPage Pro, and thank you for buying our software! The following documentation has been provided to help you learn about OmniPage Pro. This Users Manual This manual provides information on features and procedures. It includes an introduction to OmniPage Pro, installation and setup instructions, task-oriented instructions, ways to customize tools, settings guidelines, and technical information. OmniPage Pro Guide This provides online information on features and procedures.
Using This Manual This manual is written with the assumption that you know how to work in the Macintosh environment. Please refer to your Macintosh user’s manual if you have questions about how to use dialog boxes, menus, scroll bars, and so on. The following conventions are used in this manual. Convention Purpose • Emphasizes menu commands, dialog box options, labeled buttons, and file names Italicized text For example: “Choose Open... in the File menu.
Chapter 1 Introduction to OmniPage Pro You probably do most of your business correspondence and other written projects on your computer. However, certain sources of information may not be immediately usable on a computer. For example, if you want to incorporate information from a magazine article into a document in your word processor, you somehow have to get the text from the article into your computer. Painstakingly retyping the article is not an appealing solution.
What Is Optical Character Recognition (OCR)? What Is Optical Character Recognition (OCR)? Optical character recognition (OCR) is the process of turning an image into computer-editable text. An image is an electronic picture of text such as a scanned paper document or an electronic fax file. Images do not have editable text characters; they have many tiny dots (pixels) that together form a picture of text. During OCR, OmniPage Pro analyzes an image and defines characters to produce editable text.
What Is Optical Character Recognition (OCR)? Basic Steps of OmniPage Pro OCR These are the basic steps of OmniPage Pro’s OCR process: 1 Bring a document image into OmniPage Pro. You can scan a paper document or load an image file. The resulting image appears in the Image View. See “Bringing Document Images into OmniPage Pro” on page 27 for more information. 2 Create zones to identify the parts of the document you want to recognize as text or retain as graphics.
The OmniPage Pro Interface The OmniPage Pro Interface The main parts of OmniPage Pro’s user interface include: • The AutoOCR Toolbar • The Document Window • The Thumbnail Window • Zone Info and Tool Palettes • The Settings Panel AutoOCR Toolbar Tool Palette Thumbnail window Zone Info palette Image View Text View Document Window Introduction to OmniPage Pro - 8
The OmniPage Pro Interface The AutoOCR Toolbar The AutoOCR Toolbar ® contains buttons that can activate each step of the OCR process. Choose Show Toolbar in the Window menu to open the AutoOCR Toolbar if it is closed. AUTO Image Zone OCR Export button button button button button Settings Panel button The status line reports the current operation or the operation you can Proofread do next. Click the small OCR arrow to show or hide the status line.
The OmniPage Pro Interface Window menu (or active. aJ) to display a document’s Text View and make it Image View Text View Drag this splitter to the left or right to resize a view. You can select options in the Document section of the Settings Panel to specify how views in the Document window are displayed. See “Document Display Settings” on page 74 for more information.
The OmniPage Pro Interface The Thumbnail Window The Thumbnail window displays miniature pictures (thumbnails) of page images in the current document. You can use thumbnails to change pages, rearrange pages, and drag copies of images into other applications. Choose Show Thumbnails in the Window menu to open the Thumbnail window if it is closed. The bars beneath each thumbnail indicate what has been done to the image. Three bars indicate the image has been recognized.
The OmniPage Pro Interface Zone Info and Tool Palettes The Zone Info and Tool palettes are displayed when the Image View of a document is active. Choose Show Tool Palette in the Window menu if the Tools palette does not appear when the Image View is active. Use the Tool palette to draw zones, modify zones, reorder zones, erase parts of the image, zoom in or out, rotate, or straignten an image.
Getting Online Help The Settings Panel The Settings Panel is the central location of OmniPage Pro settings. You can click the Settings Panel button or choose Settings Panel in the Settings menu to open it. The Settings Panel has six different sections of settings. Each section can be displayed by clicking its icon on the left. Click each icon to view and select different settings. Scroll to see more options.
Getting Online Help Balloon Help Balloon help consists of “balloons” that pop up on screen to explain the function of icons, menus, commands, dialog box options, and other items in an application interface. To turn balloons on, choose Show Balloons in the Guide menu. Different balloons appear as you move the mouse pointer over items in the interface. Choose Hide Balloons in the Guide menu when you want to turn off balloon help.
Product Support Product Support For the fastest and easiest way to get help, please look for solutions in this manual or in the OmniPage Pro Guide. If you need additional help, product support and information are also available to registered users through the services listed in this table. Service How to Contact World Wide Web site http://www.caere.
Chapter 2 Installation and Setup This chapter provides information on installing OmniPage Pro and selecting a scanner to use with it. Please also read the Release Notes and the Scanner Setup Notes included in your OmniPage Pro package. These provide the most up-to-date information concerning installation and setup issues.
Installing the Software Installing the Software Before you install OmniPage Pro: • Make sure your scanner is working on your system by using the scanning software supplied by the manufacturer. • Turn off any virus-protection software. This is often a Control Panel device. Refer to your virus-protection software manual. • Some versions of OmniPage Pro are designed only for customers upgrading from previous versions of Caere OCR software.
Installing the Software To reuse your OmniPage Pro user dictionary: 1 From the Settings menu, select Edit User Dictionary... 2 From the dialog box that appears, select the user dictionary you want to preserve to use with the new version of OmniPage Pro and click on Open. 3 Save your dictionary to a location external to the OmniPage folder. 4 Once you have successfully installed the new OmniPage Pro, select Edit User Dictionary from the Settings menu. 5 Click Import...
Installing the Software If you just want to install individual components of OmniPage Pro, click the Custom button and select the items that you want to install in the Installer dialog box. To select more than This must be one item, hold selected to install just the application. a down the Command key ( ) as you click each item. 5 Click Install to proceed with installation. A dialog box appears that enables you to choose where the OmniPage Pro files will be installed.
Installing the Software prompting you to choose the manufacturer settings for the scanners you will use with OmniPage Pro. Click to select one (or more) manufacturer settings, and then click OK to proceed with the installation. 8 If you are performing a standard installation or if you picked Languages during a Custom installation, a dialog appears, prompting you to select the languages you wish OmniPage Pro to recognize.
Starting OmniPage Pro 9 Enter the serial number, if you are prompted to do so, and click OK. The serial number will be on the back of the OmniPage Pro CD jewel case in the lower right-hand corner under the Caere logo. 10 Select your country and click OK. 11 Insert the other installation disks as instructed (if you are installing from disks). OmniPage Pro continues with installation and notifies you when it is complete. Restart your Macintosh if you are prompted to do so after installation.
Selecting Your Scanner If you have access to the World Wide Web, you can register your copy of OmniPage Pro at Caere's Web site. To do so, go to www.caere.com and click the Support tab. Click Online Product Registration and follow the onscreen instructions. Selecting Your Scanner To use a supported scanner with OmniPage Pro, you select one (or more) scanner manufacturers during software installation.
Selecting Your Scanner For a list of supported scanners, see the Scanner Setup Notes. 4 The SCSI ID number of your scanner may appear in the Scanner Connection side of the Select Scanner dialog. Click Verify to confirm that your scanner is properly connected and recognized by OmniPage Pro. 5 On the Verification window, click OK, then click OK to close the Select Scanner dialog and confirm your settings. Scanner selection is now complete.
Chapter 3 Processing Documents This chapter describes how to process documents in OmniPage Pro from start to finish. It explains the basic steps of OCR and provides instructions for other tasks you can do with your documents. There are different ways to accomplish the same tasks in OmniPage Pro. For example, you can use toolbar buttons or menu commands to start certain procedures. You can also have OmniPage Pro do certain OCR jobs automatically, or you can step through the jobs manually.
Basic Steps of OmniPage Pro OCR Basic Steps of OmniPage Pro OCR These are the basic steps of OmniPage Pro OCR: 1 Bring a document image into OmniPage Pro. See page 27 for more information. 2 Create zones to identify the parts of the document you want to recognize as text or retain as graphics. See page 29 for more information. 3 Perform OCR to convert text information into editable text characters. See page 37 for more information. 4 Export the document to the desired location.
Automatic Processing Automatic Processing You can use the AUTO button to process a new document from start to finish or finish processing an open document. The operations that occur when you click AUTO depend on the currently set Image, Zone, OCR, and Export commands. AUTO button For example, OmniPage Pro can scan a stack of pages in a scanner’s automatic document feeder (ADF), create zones on all pages, recognize the pages, and then save them as a file.
Bringing Document Images into OmniPage Pro • If a document is open, each unfinished page is finished in order. OmniPage Pro creates zones on any unzoned pages automatically or with a currently selected zone template. It then continues with the selected OCR operation. Auto Save and Auto Paste are the only Export commands that can be activated automatically. (Auto Paste is only available in Direct Input mode.
Bringing Document Images into OmniPage Pro document if a document is not currently open. If a document is currently open, the page images are added as new pages. Loading Image Files You can load TIFF and PICT image files into OmniPage Pro. An image file is an electronic picture of text, such as a fax or scanned image, that is saved in an image file format. After you load an image file into OmniPage Pro, it appears in the Image View.
Creating Zones on a Page An OmniPage Document is a file that is saved in OmniPage Pro’s proprietary format. OmniPage Documents can be saved with original page images, zones, and recognized text. You can continue to reopen an OmniPage Document in OmniPage Pro, make edits to it, and save it in other supported file formats. If an OmniPage Document is saved with its original page images (the default setting), you can retain graphics, compare recognized text with the original image, and rerecognize pages.
Creating Zones on a Page graphics. Any part of a page not enclosed by a zone is ignored during OCR. There is only one zone on this page image. All other areas will be ignored during OCR. You can create zone templates to use when you process documents with the same zoning requirements. Zone templates remember the shape, position, order, type, contents, and style of zones. For more information, see “Creating Zone Templates” on page 102.
Creating Zones on a Page Creating Zones Automatically OmniPage Pro can create zones automatically for you. To do so, it uses the selected page layout to analyze the page and break it into ordered sections. To create zones automatically: 1 Choose a setting in the Zone button’s pop-up menu that most closely matches the format of your document. You can One Column, Multicolumn, Tables, Mixed, or a template of your own. See “Zone Button Commands” on page 4-10 for more information on these settings.
Creating Zones on a Page from the top of the first column, going down the column, and then back up to the next column). Automatic zones have purple borders. Text zone type: OmniPage Pro treats all contents as one block of text; it does not detect graphics. Tabs are inserted between any side-by-side columns detected within a zone, so this zone type is recommended only for zones that contain tables or single columns of text. Text zones have blue borders.
Creating Zones on a Page Drawing Zones Manually You can draw and modify zones using tools in the Tool palette. If the Tool palette does not appear when the Image View is active, choose Show Tool Palette in the Window menu. Polygon tool Erase Image tool Modify Zones tool Draw/Select Zones tool Order Zones tool Zoom tool (Option-click to zoom out) Straighten button Rotate buttons You can use the tab key to cycle through zone tools when the Image View is active.
Creating Zones on a Page 5 Repeat steps 2–4 until you have finished drawing zones around each area that you want to process. You can draw up to 64 separate zones. A number appears within each zone indicating the order in which it will be recognized. Overlapping Zones. When you draw a zone over an existing zone, the borders of the new zone will wrap around the boundaries of the existing zone. The zones will not overlap. You can use the Polygon tool to draw a zone one side at a time.
Creating Zones on a Page You will not be allowed to draw a line if it constitutes a restricted shape. The following zone shapes are restricted: Indented along the bottom Indented along the top Hole in the middle Modifying Zones Zones can always be modified before OCR takes place. You can move, copy, resize, reorder, extend, connect, divide, and delete zones. You can also reverse the black and white elements on a page image. See “Inverting an Image” on page 54 for more information.
Creating Zones on a Page To reorder zones: 1 Click the Order Zones tool in the Tool palette. The numbers in the zones disappear. 2 Click within the zone you want to recognize first. The number 1 appears in the zone. 3 Click within the next zone you want recognized. The number 2 appears in the zone. 4 Continue until all the zones are appropriately ordered. If you do not number all the zones, they will be automatically numbered for you when you select another tool or start OCR.
Converting Images to Text a To remove an area of a zone, hold down the Command key ( ) while using the Modify Zones tool. To connect two or more zones: 1 Click the Modify Zones tool in the Tool palette. 2 Position the mouse pointer in one of the zones you want to connect. 3 Hold the mouse button down and drag the mouse pointer onto the zones you want to connect. 4 Release the mouse button when you are done. The zone border changes to display the modified zone area.
Converting Images to Text This section describes the following procedures: • Performing OCR • Proofreading OCR Results • Verifying Recognized Text • Displaying Color Markers • Getting Page Information Performing OCR Before performing OCR, make sure the current zones and settings are appropriate for your document. For example, to retain graphic zones during OCR, you must select Retain Graphics in the OCR section of the Settings Panel. See “Settings Guidelines” on page 79 for more information.
Converting Images to Text You can select dictionaries and other error checking options in the Spelling section of the Settings Panel. See “Spelling Settings” on page 72 for more information. To check and correct errors in recognized text: 1 Click the OCR Proofreader button in the AutoOCR Toolbar or choose Proofread OCR... in the Edit menu.
Converting Images to Text • Click Change & Add to replace the word with the word in the Change to edit box and to add the word to the current user dictionary. OmniPage Pro will still stop at future instances of the word in the current document if the word contains a suspect character or a Language Analyst correction. After you select an option for the word, OmniPage Pro automatically continues to find the next possible spelling error. 3 Click Done to save all changes and exit the operation.
Converting Images to Text Displaying Color Markers After OCR, certain text in the recognized document might be marked with color in the Text View. These include: • Reject characters (red) • Suspect words (green) • Language Analyst replacements (blue) The Text View must be active to hide, show, or clear markers. To permanently remove color markers, choose Clear Markers in the Edit menu. All text reverts to black. You can also temporarily hide color markers by choosing Hide Markers in the Edit menu.
Scheduling OCR • Number of words on the page • Recognition time in minutes and seconds This does not count scanning time, the time it takes to draw manual zones, or the time spent writing data to disk. • Number of reject (unrecognizable) characters • Number of suspect (questionable) characters which OmniPage Pro made an attempt to recognize. • Recognition rate in characters per second and words per minute. Scheduling OCR OmniPage Pro can perform OCR on documents while you are away from your computer.
Scheduling OCR it recognizes scheduled jobs. Pages in a document that have already been recognized will not be rerecognized. Setting Up an Automatic Input/Output System If you regularly receive documents that need to be converted to text, such as fax files, you can set up an input/output system to facilitate OCR processing. You can specify an input folder that OmniPage Pro will check every 30 seconds.
Scheduling OCR 5 Click OK in the Schedule OCR dialog box to save your settings as specified. Adding Individual Documents to the Schedule If you have documents that need to be converted to text, you can manually add them to the processing schedule. Files will be recognized after the specified time. Recognized files are then placed in the designated output folder. To add individual documents: 1 Choose Schedule OCR... in the Process menu. The Schedule OCR dialog box appears.
Scheduling OCR Settings for Scheduled Files The following settings in the Schedule OCR dialog box are used for all files in the processing queue. When to Perform OCR Files in the processing queue are recognized in order after the specified time. • Select Immediately to start recognizing scheduled jobs as soon as you click OK in the Schedule OCR dialog box. If OmniPage Pro is watching an input folder, it tries to recognize new files as soon as it detects them.
Direct Input: Pasting Text into Other Applications Default Output Options All newly scheduled files have the same default output folder and file format assigned to them. Click Set Output... to change the default options. The default file name is always the original file name with the word Output appended. You can change the output folder, output file format, and output file name for any scheduled document. To do so, select a file in the Input File List and click Modify.
Direct Input: Pasting Text into Other Applications must have enough memory to run OmniPage Pro and the application at the same time. Text formatting, such as bold and italics, is retained if you are pasting into an application that supports RTF information. Otherwise, only plain text will be pasted. Direct Input works best when you need to process just a few pages because some applications may not be able to paste very large amounts of text.
Working With Documents 4 Choose OmniPage Direct Input in the Apple menu. OmniPage Pro opens in Direct Input mode. This adds a special Auto Paste command to the Export button of the AutoOCR Toolbar. Auto Paste is only available in Direct Input mode. It is automatically selected when you activate Direct Input. Automatic processing begins immediately if Begin Processing Automatically on Launch was selected in the Direct Input section of the Settings Panel.
Working With Documents to display the Image View and make it active. Choose Text View in the Window menu (or j) to display the Text View and make it active. a Image View Text View Current page number Drag this splitter to the left or right to resize a view.
Working With Documents You can select a setting in the Document section of the Settings Panel that determines how the Text and Image Views are displayed. See page 65 for more information. To resize a page view: 1 Click the view (Text or Image) that you want to resize to make that the active view. 2 Use one of the following methods to zoom in or out: • Choose Zoom In, Zoom Out, Zoom to Width, or Zoom to View in the Window menu.
Working With Documents Changing Pages You can change pages in a document in the following ways. • Click the thumbnail of the page you want to display. Choose Show Thumbnails in the Window menu to open the Thumbnail window if it is closed. The thumbnail of the currently displayed page has a shaded background. • Click the forward or backward arrow buttons next to the current page number located along the bottom of the Document window. • Choose Go to Page...
Working With Documents Reordering Pages You can reorder pages in a document by dragging their thumbnails to different positions in the Thumbnail window. Choose Show Thumbnails in the Window menu to open the Thumbnail window if it is closed. Click the thumbnail of the page you want to move and drag it above the desired page number. Deleting a Page You can delete a page from a document that has at least two pages. For example, you may want to delete a page that was poorly scanned.
Working With Documents Modifying Images You can modify an image when the Image View is active. Choose Image View in the Window menu (or m) to display the Image View and make it active. a Rotating an Image You can rotate a page image when the Image View is active. For example, if a page is accidentally scanned upside down, you can correct the orientation by rotating it. If you need to rotate a page, be sure to do so before you create zones. All zones are deleted during page rotation.
Working With Documents If you do not want to permanently erase parts of the actual image, but want to omit areas of a page during OCR, identify the areas as Ignore zone types or do not include them in any zones at all. Inverting an Image OmniPage Pro cannot perform OCR properly on white text on a black background. To remedy this, you can invert an image (reverse the black and white elements) before OCR. However, if you invert an image with a color depth of 256 colors (8-bit), you can not re-invert it.
Working With Documents Selecting All Text To apply formatting, such as a particular font, to all text on a page, you can select the entire page by choosing Select All in the Edit menu (or a). The entire contents of a recognized page is selected when the Text View is active. To deselect the page, click anywhere within it. a Formatting Text Use commands in the Format menu to apply font, font style, and font size formatting to selected text in your recognized document.
Working With Documents The options available in the Page Setup dialog box depend on your printer. 2 Select the desired options and then click OK. To print pages: 1 Make the view (Text or Image) from which you want to print active. 2 Choose Print Text... (or Print Images...) in the File menu. The dialog box that appears depends on your printer. 3 Select print options for your document. If you are printing from the Image View, the dialog box displays the Scale Images to Fit Page option.
Exporting Documents Exporting Documents You can export original images or recognized text for use in other applications by: • Saving a Document • Copying a Document to the Clipboard • Using Drag and Drop Functionality Saving a Document You can save recognized text, retained graphics, and original images to disk in a variety of file formats. Save your document as an OmniPage Document file or as an image file if you want to reopen it in OmniPage Pro again.
Exporting Documents The available file formats depend on the particular document you are saving. For example, if you are saving an unrecognized image, you can only save it as an OmniPage Document or an image file. See “Supported File Formats” on page 120 for more information. The True Page style set should only be used if your target application supports frame formatting. File formats that support frame formatting have a TP in front of their names in the Save As dialog box.
Exporting Documents Copying a Document to the Clipboard You can copy every page of recognized text to the Clipboard. The text can then be pasted directly into another application. You can also copy zones in the Image View to the Clipboard. Copying text to the Clipboard works best when you are copying just a few pages because some applications may not be able to paste very large amounts of text.
Exporting Documents Using Drag and Drop Functionality OmniPage Pro supports drag-and-drop functionality on System 7.5 (or later) and on systems that have it installed as a separate extension. Dragging Thumbnails You can drag a thumbnail from the Thumbnail window to the desktop or to another application that supports drag-and-drop functionality. The contents of a thumbnail is converted to a line-art PICT file with the same resolution as the original image.
Chapter 4 OmniPage Pro Settings This chapter describes the settings you can select in OmniPage Pro. Make sure that settings are appropriate for your document before you start processing it. You may have to experiment with different settings to get the results you want.
AutoOCR Toolbar Settings AutoOCR Toolbar Settings The AutoOCR Toolbar buttons allow you to take a document through each step of the OCR process. You can set various commands in the popup menus beneath the Image, Zone, OCR, and Export buttons. Or, you can choose Process Settings in the Process menu and choose commands in the submenu. Image button Zone button OCR button Export button Pictures in the AutoOCR Toolbar buttons and menu commands in the Process menu change as you set different commands.
AutoOCR Toolbar Settings Multi-column Select Multi-column to have OmniPage Pro automatically draw and order zones on multiple-column document images such as magazine or newspaper articles. For more information, see “Creating Zones Automatically” on page 31. Tables Select Tables to have OmniPage Pro automatically draw and order zones on table format document images such as spreadsheets, or any page that contains a table. For more information, see “Creating Zones Automatically” on page 31.
AutoOCR Toolbar Settings Defer OCR Select Defer OCR to tell OmniPage Pro to delay text recognition during automatic processing. When you click AUTO, OmniPage Pro does the selected Image and Zone operations, but stops before OCR. You can then save the document as an OmniPage Document and process it later. Or, you can change the OCR command and activate another OCR operation. Train OCR Select Train OCR to teach OmniPage Pro how to recognize special characters.
Selecting Settings Auto Paste (Direct Input mode only) Select Auto Paste to paste recognized text into another application when you are using the Direct Input feature. If no application is open, text is placed on the Clipboard. Selecting Settings The Settings Panel is the central location of OmniPage Pro settings. To open it, click the Settings Panel button in the AutoOCR Toolbar or choose Settings Panel in the Settings menu. The Settings Panel has six sections of options.
Scanner Settings Scanner Settings Click the Scanner icon in the Settings Panel to select options that control the way your scanner scans a page. To automatically open the Settings Panel to the Scanner section, Option- click the Image button in the AutoOCR Toolbar when it is set to Scan Image . Page Size Options Select the dimensions of the pages you plan to scan in the Size pop-up menu. • Select Letter for 8.5 by 11 inch pages. • Select A4 for 21 by 29.7 cm pages. • Select Legal for 8.
Scanner Settings ADF Options If you use a scanner with an automatic document feeder (ADF), you can use the following options. • Select Scan until Empty (the default setting) to scan every page in your scanner’s ADF. This setting is useful when you want to scan a stack of pages at once. If Scan until Empty is not selected, OmniPage Pro only scans the first page in your ADF and you must click the Image button to scan each subsequent page.
Scanner Settings Brightness The Brightness option for scanning a page is like the brightness setting used on a copy machine. This setting can compensate for variations in paper and print quality, so it can have a big influence on OCR accuracy. Click the Brightness check box to activate the adjustment (lighten or darken) for the brightness of the entire page. This is the only available Scanner option if you have a black-and-white (Monochrome) scanner.
OCR Settings OCR Settings Click the OCR icon in the Settings Panel to select input and output options that assist OmniPage Pro during recognition. To automatically open the Settings Panel to the OCR section, Option- click the OCR button in the AutoOCR Toolbar. (A document must be open for the button to be active.
OCR Settings This feature is only used for documents on which zones have been created automatically (and not manually modified). The Automatically Correct Page Orientation feature takes extra processing time. To increase processing speed, deselect this setting and make sure your page image is properly oriented in the Image View before performing OCR. To manually correct the orientation of a page, see “Rotating an Image” on page 53.
Direct Input Settings • Make sure Save Page Image in OmniPage Document and the desired resolution are selected in the Document section of the Settings Panel. • Make sure that graphics on a page image are identified as Graphic zone types. These have green borders and display a graphic icon. See “Specifying Zone Types” on page 31 for more information. For additional guidelines, see “Do you want to retain graphics in your document?” on page 89.
Spelling Settings • Select Begin Processing Automatically on Launch if you want OmniPage Pro to trigger the AUTO button as soon as you activate the Direct Input operation. Text will be recognized automatically and pasted into your application. Deselect Begin Processing Automatically on Launch if you want to control when to start recognition. This is recommended if you want to check settings first or draw zones manually on the page image.
Spelling Settings Additional Language(s) In addition to the Main Language for recognition, you may select one (or more) additional (secondary) languages for use with OCR. Because the consideration of additional language sets takes additional processor time, you should only activate this feature if your documents contain more than one language. To select Secondary Language and dictionary: 1 Click the Select button in the Spelling area of the Settings Panel. The Select Secondary Languages dialog appears.
Document Settings Spell Checking Options Select any of these spell checking options for checking recognition or using the Language Analyst. Use Language Analyst Select Use Language Analyst to have the Language Analyst replace unknown words with words most likely to be correct during OCR. The Language Analyst uses the current dictionaries and information about language context and usage rules to evaluate words, compute likely errors, and determine replacement words.
Document Settings Image View as you work. OmniPage Pro will activate and enlarge a view according to the current task. • Select Show Selected View Only if you want OmniPage Pro to display the active view and hide the other view. This is recommended for small monitors. OmniPage Pro determines which view should be visible according to the current task. To switch between views, you can choose Image View ( m) or Text View ( j) in the Window menu.
Preference Settings The Thumbnail window displays miniature pictures (thumbnails) of page images in the current document. You can use thumbnails to change pages, rearrange pages, and drag copies of images into other applications. Save Page Image in OmniPage Document Select Save Page Image in OmniPage Document to retain original images in OmniPage Documents. An image is the picture of a page that appears in the Image View when you scan a page or open an image file.
Preference Settings General Preferences The General Preferences settings provide control for unrecognized pages that you have scanned, and the prompt before deleting pages. Auto Button Finishes All Unrecognized Pages Select AUTO Button Finishes All Unrecognized Pages if you want OmniPage Pro to finish all pages in a document when you click the AUTO button. If this is deselected, the AUTO button will only finish the current page.
Preference Settings The colors settings enable you to define the color depth of the scanned image. You can choose 256 colors (8-bit), or Thousands of colors (16-bit) depending on your needs for the scanned image. The memory requirements for each scanned page size, selected dpi resolution, and selected color depth appear at the bottom of the Preferences Settings Panel. These settings are ignored if you have selected Monochrome or Gray Scale in the Scanner selection of the Settings Panel.
Settings Guidelines Settings Guidelines The settings you select in OmniPage Pro can greatly affect OCR results. Make sure that settings are appropriate for your document before you begin processing. You may have to experiment with different settings to get the results you want. Answer the following questions to get settings recommendations for your documents.
Settings Guidelines What type of document are you processing? Magazine or newspaper article Recommendations: Select the appropriate page size and orientation in the Scanner section of the Settings Panel if you are scanning. Choose the Zone setting that is appropriate for your source material. For magazine or newspaper articles, this may be either the Multicolumn or Mixed setting. Modify zones manually if auto zoning does not successfully create zones around all page areas you want to process.
Settings Guidelines What type of document are you processing? Memo or letter Recommendations: Select the appropriate page size and orientation in the Scanner section of the Settings Panel if you are scanning. Choose the Zone setting that is appropriate for your source material. For a memo or a letter, this is generally the One Column setting. Draw zones manually around any graphics you want to retain. Identify them as Graphic zone types. See Specifying Zone Types on page 31.
Settings Guidelines What type of document are you processing? Spreadsheet or table Recommendations: Select the appropriate page size and orientation in the Scanner section of the Settings Panel if you are scanning. Choose the Zone setting that is appropriate for your source material. For a spreadsheet or table, this is usually the Tables setting. Modify zones manually if auto zoning does not successfully create zones around all page areas you want to process.
Settings Guidelines What type of document are you processing? Legal document Recommendations: Select the appropriate page size and orientation in the Scanner section of the Settings Panel if you are scanning. Draw zones manually around the page areas you want to retain. See Drawing Zones Manually on page 33. Omit unnecessary parts of the page. For example, do not include line numbers in a zone if you plan to renumber lines in your word processor.
Settings Guidelines What type of document are you processing? Mixed formats or not sure Recommendations: Select the appropriate page size and orientation in the Scanner section of the Settings Panel if you are scanning. Choose the Zone setting that is appropriate for your source material. With mixed formats of material, this is probably the Mixed setting. Modify zones manually if auto zoning does not successfully create zones around all page areas you want to process.
Settings Guidelines What is the quality of the original document? Poor or not sure Degraded copies, colored or Recommendations for scanning: Try to scan original documents rather than Select Manual Brightness and Manual shaded backgrounds, run-together copies. or broken text characters Contrast in the Scanner section of the Settings Panel if you have a color or grayscale scanner, and the page has crisp text on colored or shaded backgrounds.
Settings Guidelines What is the quality of the original document? Good Clear, well-formed text characters Recommendations: on a clean, white background Select Manual Brightness in the Scanner section of the Settings Panel for the fastest processing if you are scanning. Use a setting near the middle of the scrollbar. well-formed text characters Deselect Use Language Analyst in the Spelling section of the Settings Panel for faster processing.
Settings Guidelines How much formatting do you want to keep? None Keep plain text only Recommendations: Select Plain Format as the style set for the page. See Applying Styles to Zones on page 93. Save the recognized document as ASCII Text. Or, copy the text to the Clipboard and paste it into your target application. See Exporting Documents on page 57. Use the Direct Input feature to paste small amounts of text directly into another open application.
Settings Guidelines How much formatting do you want to keep? As much as possible Keep font characteristics, Recommendations: Make sure all parts of the page are included paragraph formatting, side-by-side within zones and identified as the correct columns, and graphic positioning zone type. See Specifying Zone Types on page 31. Select True Page as the style set for the page. See Applying Styles to Zones on page 93. Select the fonts you want mapped to various font types.
Settings Guidelines Do you want to retain graphics in your document? Yes Keep graphics such as logos and photos during OCR processing Recommendations: 3D OCR and Manual Brightness and/or Manual Contrast in the Scanner section of the Select Settings Panel if you are scanning with a color or grayscale scanner and you want color or grayscale graphics. If you have HP AccuPage selected as your scanner Select Scanner dialog, you cannot setting in the retain grayscale graphics.
Settings Guidelines Do you want to retain graphics in your document? No Ignore graphics such as logos and photos during OCR processing Recommendations: Do not draw any zones around graphic areas if you Deselect are drawing zones manually. Retain Graphics in the OCR section of the Settings Panel. Double-check that there are no zones around graphics before performing OCR. Designate graphic zones to be ignored.
Settings Guidelines How many languages are in your document? More than one language Recommendations for more accurate processing: Use this method if you have installed all the languages needed for your document. 1 Select a main and other languages in the Settings panel under 2 Spelling . Draw a zone. See Creating Zones on a Page on page 29. 3 Perform OCR on the document and save the text in the desired file format.
Settings Guidelines Are you processing a large document? Yes Recommendations if you have an automatic document feeder (ADF): Select Scan Until Empty in the Scanner section of the Settings Panel to scan a stack of pages at once. Otherwise, you must click the Image button to scan each subsequent page. Select Double-Sided Pages in the Scanner section of the Settings Panel to scan pages with print on both sides.
Chapter 5 Customizing OCR OmniPage Pro has many features that allow you to customize the way your documents are handled during OCR. This chapter describes how to create and use these tools.
Applying Styles to Zones Style sets and zone styles can be selected in the Zone Info palette that is displayed when the Image View is active. Choose Show Zone Palette in the Window menu to display the palette. Selected style zone for the current zone. style set for the Selected current page Built-In Styles Sets OmniPage Pro is shipped with the following built-in style sets.
Applying Styles to Zones True Page This style set retains as much text, paragraph, and page formatting as possible. It contains one style called Auto Detect which tries to discern all formatting automatically. True Page uses frames (formatting boxes) to precisely lock the positions of the text and graphics within their zones. Or, you can remove the frames and maintain the overall layout, yet gain easier editing.
Applying Styles to Zones The Zone Info palette appears automatically if it is open. If it is closed, choose Show Zone Info Palette in the Window menu. Current style set 2 Select the desired style set in the Style Set for Page pop-up menu. In addition to the built-in and sample style sets, any style sets you create appear in the pop-up menu. See “Creating Style Sets” on page 97 for more information. To apply styles to zones: 1 Make the Image View of the page active.
Applying Styles to Zones Available styles depend on the style set selected for the current page. (Built-in style sets only have one style each.) Styles are applied to zones during recognition. Shortcut for applying zone styles Hold the mouse button down while the mouse pointer is over a zone. A menu of all the zone styles in the current style set is displayed. Select the style you want to use for that zone. If a style set only contains one style, no menu will appear.
Applying Styles to Zones For example, you could enter Memos as the name if you are creating a style set for memo-type documents. 4 Click New. The Edit Style Set dialog box appears. Auto Detect is the default style for every new style set. 5 Click New to add a new style to the style set. The New Zone Style Name dialog box appears. 6 Enter a name for the style you want to add and click OK. For example, you could enter Heading as the name if you are creating a style for heading-type paragraphs.
Applying Styles to Zones To edit styles in a style set: 1 Choose Edit Style Set... in the Settings menu if you do not already have your style set open. 2 Double-click the style set you want to edit. The Edit Style Set dialog box lists the styles in the style set. Click to make font- The currently mapping selected style selections for the entire Settings for style set the currently selected style Example of the currently selected styles 3 Click the name of the style you want to edit.
Applying Styles to Zones When you add a new style to a style set, its default formatting is based on the formatting of the last-selected style. Therefore, to base a new style on an existing style, select the existing style before creating the style. To add new styles to the current style set: 1 Click New in the Edit Style Set dialog box. The New Style Name dialog box appears. 2 Enter a name for the style you want to add and click OK.
Specifying Zone Contents The font-mapping selections for a single style set apply to any style that has Auto selected as the font setting. Different style sets can have different font-mapping selections. To change font mapping for a style set: 1 Choose Edit Style Set... in the Settings menu. 2 Double-click the style set for which you want to change font mapping selections. 3 Click Font Mapping... in the Edit Style Set dialog box. The Automatic Font Mapping dialog box appears.
Creating Zone Templates located in the Zone Info palette, which appears automatically when the Image View is active. Zone contents setting for the currently selected zone For example, if a particular zone only contains numbers and mathematical signs, you can specify the contents of that zone to be Numeric. OmniPage Pro will only look for numeric characters in that zone during recognition.
Training OCR for Special Characters 2 Choose Save Zone Template... in the File menu. The Save Zone Template dialog box appears. 3 Type a name for your file. 4 Click Save. The zone template file is saved in the Zone Templates folder within your installation folder. To apply a zone template to a page: 1 Open the page image and make sure the Image View is active. 2 Select the zone template you want to use in the Zone button pop-up menu in the AutoOCR Toolbar.
Training OCR for Special Characters 3 Set Train OCR as the command in the OCR button’s pop-up menu. 4 Click the OCR button or choose Train OCR in the Process menu. OmniPage Pro analyzes the document and then opens the Training File dialog box. Training files are designed for special characters that may appear in your documents. They are not designed to assume the task of recognizing ordinary characters. Thus, you should not create a training file to accommodate a specific font or type style.
Training OCR for Special Characters The Specify Character dialog box displays the selected character as it appears in the original page image. Original Image of the selected character Click any character you want to associate with the selected character 6 Specify how you want OmniPage Pro to interpret the character during OCR. You can type the desired character(s) in the Character Code edit box. Or, click a character in the scrolling list to add it to the edit box.
Training OCR for Special Characters A dialog box appears listing all training files in the Training Files folder. 2 Double-click the training file you want to edit. Or, select it and click Open. The Training File dialog box displays characters in the training file. 3 Double-click a character you want to edit. The Specify Character dialog box appears.
Creating User Dictionaries 5 Click OK to accept the character specification. The Training File dialog box reappears. 6 Repeat steps 3–5 to continue editing specified characters. Click Delete to discard a selected character from the training file. 7 Click Save to save the edited training file. Or, click Append to add the trained characters to another training file. Creating User Dictionaries Dictionaries are used for recognition and error checking.
Creating Custom Settings Files • Type a word in the New Word edit box and click Add to add it. • Select a word in the list box and click Delete to delete it. • Click Delete All to remove all words from the dictionary. • Click Import... to add words from a text file. OmniPage Pro goes through the selected text file and adds the words to your dictionary. 4 Click Done to save edits to your dictionary and exit. Or, click Export... to save your user dictionary as a text file.
Creating Custom Settings Files To load settings: 1 Choose Load Settings... in the File menu. Settings Panel and language selections are changed according to the selected settings file. 2 Double-click the settings file you want to load.
Chapter 6 Technical Information This chapter provides troubleshooting tips and other technical information about using OmniPage Pro. Please also read the Release Notes and Scanner Setup Notes that came in your OmniPage Pro package. These contain the latest information on OmniPage Pro and its supported scanners.
General Troubleshooting Solutions General Troubleshooting Solutions Although OmniPage Pro is designed to be easy to use, problems sometimes occur. Many of the onscreen error messages contain selfexplanatory descriptions of what to do — check connections, quit other applications to free up memory, and so on. Sometimes that is all the troubleshooting help you need. Please see your Macintosh user’s manual for information on optimizing your system and application performance.
General Troubleshooting Solutions Low Memory Problems OCR is a CPU-intensive operation. The more memory you have, the better things will run. OmniPage Pro may run poorly under low memory conditions. You may be experiencing low-memory problems if you get out-of-memory messages, if OmniPage Pro works slowly, or if it accesses the hard disk a lot. Try these solutions for low memory conditions: • Close other open applications and restart OmniPage Pro. • Restart your Macintosh.
General Troubleshooting Solutions messages about memory while using OmniPage Pro, try increasing the size of its memory partition to remedy the problem. If you increase any application’s memory partition size, the amount of memory available for other applications is decreased when that application is running. To increase OmniPage Pros memory partition: 1 Quit OmniPage Pro if the program is open. 2 Open the OmniPage Pro folder. 3 Select the OmniPage Pro application icon.
General Troubleshooting Solutions with OmniPage Pro. More disk space is recommended if you work with lots of complex documents or with color images. To find out the amount of free hard disk space on your system: 1 Double-click your hard disk icon to open it. 2 Choose by Small Icon or by Icon in the Finder’s View menu. 3 Check the number in the upper-right corner of the window for the amount of available disk space.
Scanning Issues Scanning Issues Topics in this section include: • Problems Connecting OmniPage Pro to Your Scanner • Scanning Problems • Scanning Tips You can also visit Caere’s World Wide Web site at www.caere.com for updated scanner information and driver files, which you can download. Click the Support button from the home page and look for the Product Support Software Library.
Scanning Issues • Make sure your scanner and any other device connected to the SCSI port of your Macintosh have unique SCSI ID numbers. The last SCSI device in the chain must be terminated properly if you have more than one device daisy-chained to the Macintosh SCSI port. • Make sure the scanner is not in use by another application. • Reinstall OmniPage Pro. Scanner Drivers Supplied by the Manufacturer Many scanners require a proprietary driver that is supplied by the scanner manufacturer.
Scanning Issues • Turn your scanner off and on again to return the scanner to its default state. Then restart your computer. • Check with the scanner manufacturer to make sure you have the latest driver for your scanner. • Resolve low memory problems. See “Low Memory Problems” on page 112 for more information. • Resolve low disk space problems. See “Low Disk Space Problems” on page 113 for more information. Scanning Tips OCR results will be poor if an image is not scanned properly.
OCR Problems OCR Problems This section contains information and solutions for possible OCR problems. Topics include: • Crash During OCR • Text Does Not Get Recognized Properly • Problems With Fax Recognition Crash During OCR Try these solutions if a crash occurs during OCR or if processing takes a very long time: • Check the quality of the image you are recognizing. See “What is the quality of the original document?” on page 85 for more information.
OCR Problems • OmniPage Pro cannot recognize white text on a black background. If your page image has this type of text, you can reverse the black and white elements so that the text is black and the background is white. See “Inverting an Image” on page 54. OmniPage Pro recognizes printed text characters only. However, it can retain handwritten text, such as a signature, as a graphic element. See page 89 for guidelines on retaining graphics.
Supported File Formats Supported File Formats This section lists the supported import and export file formats along with some information on exporting. Import File Formats OmniPage Pro can open the following file formats. OmniPage Document PICT (type 2) TIFF Uncompressed TIFF Compressed (RLE/Huffman, ITU Fax Group 3 and 4, and PackBits) The minimum resolution of all image files must be 100x200 dpi. TIFF files can be binary or grayscale.
Supported File Formats How Exported Text Appears A recognized document in OmniPage Pro’s Text View might look different once it is saved and then opened in your target application (the application where a recognized document eventually ends up). The way text appears in your target application depends on the capabilities and limitations of the application and the file format that was selected.
Apple Event Support Apple Event Support OmniPage Pro supports the four required Apple Events and a small set of custom Apple Events that allow you to automate recognition tasks. This section briefly describes all the Apple Events that OmniPage Pro supports. You can use Script Editor (a scripting editor that is part of Apple’s AppleScript package) to write scripts that control OmniPage Pro with Apple Events.
Apple Event Support Recognition of scheduled jobs cannot begin if you have a document open in OmniPage Pro. Use the get status call to check if there is a document currently open. See “get status” on page 125 for more information. set output format to format Event Class Event ID Parameter: Returns: ‘RFU ’ ‘RXW ’ keyword: descriptor type: data: descriptor type: data: ‘GDWD’ TEXT format name long N$(6XFFHVV if the output type is valid N$(,QYDOLG2XWSXW7\SH if the output type is not valid Set output fo
Apple Event Support Schedule OCR dialog box. The default output folder is called Output Files. If this call is never made, the output file name will be the same as the input file name, with the word Output added.
Apple Event Support data: Returns: descriptor type: data: a list of one or more image file names long N$(6XFFHVV if OCR was successfully started N$(,QYDOLG)LOH1DPH if one of the input file names is invalid N$(-RE$GGHG7R4XHXH if this job was only added to the job queue because another document is already open N$(-RE4XHXH,V)XOO if the job could not be added to the queue because the queue is full This function creates a job and adds it to the processing queue in the Schedule OCR dialog box.
Apple Event Support N$(6XFFHVV if there is no job or document open Get status returns the current status of OmniPage Pro. If you just added a job to the queue and the return value indicated that OmniPage Pro immediately started handling it, you can use this call to check on the status of your job. The status will be N$(-RE,Q3URJUHVV until the job is finished; then it will be N$(6XFFHVV.
Apple Event Support A Sample Script You can use Apple’s Script Editor to control OmniPage Pro via Apple Events. This is an example script to get you started. This script assumes you have a TIFF file called Test TIFF on your hard disk called HD. tell application "OmniPage Pro 8.
Glossary Terms 3D OCR® A technology developed by Caere that uses grayscale information to increase accuracy when recognizing scanned text characters. active window The window on the computer desktop where the next action will take place. ADF See automatic document feeder. ASCII An acronym for American Standard Code for Information Interchange. This is a code used for representing text inside a computer and for transmitting text between computers or between a computer and a peripheral device.
driver A program that manages the transfer of information between a computer and a peripheral device such as a scanner. error message An onscreen message that reports an error or problem in the execution of a program or in your communication with the system. fax Short for facsimile machine. Fax machines scan a page, convert the image into digital data, and send the data over a phone line to another fax or computer. The receiving machine recreates the image on paper or stores the data on disk as a fax file.
mapping See font mapping. monospaced font Any font in which all characters have the same width. For example, in Courier (a monospaced font), the letter 0 is the same width as the letter . Thus, 00000 is the same width as . mouse pointer A small shape on the screen that follows the movement of the mouse. The pointer can take the shape of an arrow, an l-beam, or other graphical character depending on the current operation. OCR See optical character recognition.
Thumbnail window The window in OmniPage Pro that displays miniature representations of pages in an open document. TIFF An abbreviation for Tagged Image File Format. This is a standard graphic format for grayscale and high-resolution bitmapped images. training file A set of pre-recognized text characters that OmniPage Pro compares with characters in a page image during OCR. This is useful for recognizing special characters that might normally be difficult to recognize such as % and @.