15.0
Table Of Contents
- Introducing ABBYY FineReader
- The New Task window
- PDF Editor
- OCR Editor
- Launching the OCR Editor
- OCR Editor interface
- Obtaining documents
- Recognizing documents
- Improving OCR results
- If your document image has defects and OCR accuracy is low
- If areas are detected incorrectly
- If the complex structure of a paper document is not reproduced
- If you are processing a large number of documents with identical layouts
- If tables and pictures are not detected
- If a barcode is not detected
- If an incorrect font is used or some characters are replaced with "?" or "□"
- If your printed document contains non-standard fonts
- If your document contains many specialized terms
- If the program fails to recognize certain characters
- If vertical or inverted text was not recognized
- Checking and editing texts
- Copying content from documents
- Saving OCR results
- Integration with other applications
- Automating and scheduling OCR
- ABBYY Compare Documents
- ABBYY Screenshot Reader
- Reference
- How to set ABBYY FineReader 15 as your default PDF viewer
- Types of PDF documents
- Scanning tips
- Taking photos of documents
- Options dialog box
- Format settings
- Supported OCR and document comparison languages
- Supported document formats
- Document features to consider prior to OCR
- Image processing options
- OCR options
- Working with complex-script languages
- Supported interface languages
- Current date and time on stamps and in headers and footers
- Fonts required for the correct display of texts in supported languages
- Regular expressions
- Installing, activating, and registering ABBYY FineReader
- Appendix
- Technical support
- Third-party software
260
ABBYY® FineReader 15 User’s Guide
Small Cyrillic letter
[а-я]
Digit
[0-9]
@
Reserved.
Note:
1. To use a regular expression symbol as a normal character, precede it with a back slash. For
example,[t-v]x+ stands for tx, txx, etc., ux, uxx, etc., and vx, vxx, etc., but \[t-v\]x+ stands for [t-
v]x, [t-v]xx, [t-v]xxx, etc.
2. To group regular expression elements, use brackets. For example, (a|b)+|c stands for c or any
combinations like abbbaaabbb, ababab, etc. (a word of any non-zero length in which there
may be any number of a's and b's in any order), while a|b+|c stands for a, c, b, bb, bbb, etc.
Examples
Suppose you are recognizing a table with three columns: birth dates, names, and e-mail addresses. In
this case, you can create two new languages, Data and Address, and specify the following regular
expressions for them.
Regular expression for dates:
The number denoting a day may consist of one digit (1, 2, etc.) or two digits (02, 12), but it cannot be
zero (00 or 0). The regular expression for the day should then look like this: ((|0)[1-9])|([1|2][0-9])|(30)|
(31).
The regular expression for the month should look like this: ((|0)[1-9])|(10)|(11)|(12).
The regular expression for the year should look like this: ([19][0-9][0-9])|([0-9][0-9]).
Now all we need to do is combine all this together and separate the numbers by period. The period is a
regular expression symbol, so you must put a back slash (\) before it.
The regular expression for the full date should then look like this:
((|0)[1-9])|([1|2][0-9])|(30)|(31)\. ((|0)[1-9])|(10)|(11)|(12)\.((19)[0-9][0-9])|([0-9][0-9])
Regular expression for e-mail addresses:
[a-zA-Z0-9_\-\.]+\@[a-z0-9\.\-]+










