12.0

ABBYY FineReader 12 User’s Guide
107
Matches
forbids 1
Letter or Digit
[09azAяАЯ]
[09azAяАЯ] allows any single character
[09azAяАЯ]+ allows any word
Capital Latin
Letter
[AZ]
Small Latin
Letter
[az]
Capital Cyrillic
letter
Я]
Small Cyrillic
letter
я]
Digit
[09]
@
Reserved.
Note:
1. To use a regular expression symbol as a normal character, precede it with a backslash. For
example, [tv]x+ stands for tx, txx, etc., ux, uxx, etc., and vx, vxx, etc., but \[tv\]x+
stands for [tv]x, [tv]xx, [tv]xxx, etc.
2. To group regular expression elements, use brackets. For example, (a|b)+|c stands for c or
any combinations like abbbaaabbb, ababab, etc. (a word of any nonzero length in which
there may be any number of a's and b's in any order), while a|b+|c stands for a, c, b, bb,
bbb, etc.
Examples
Suppose you are recognizing a table with three columns: birth dates, names, and email addresses.
In this case, you can create two new languages, Data and Address, and specify the following
regular expressions for them.
Regular expression for dates:
The number denoting a day may consist of one digit (1, 2, etc.) or two digits (02, 12), but it cannot
be zero (00 or 0). The regular expression for the day should then look like this: ((|0)[1
9])|([1|2][09])|(30)|(31).
The regular expression for the month should look like this: ((|0)[1 9])|(10)|(11)|(12).
The regular expression for the year should look like this: ([19][09][09]|([09][09])|([20][0
9][09]|([09][09]).
Now all we need to do is combine all this together and separate the numbers by period (e.g.
1.03.1999). The period is a regular expression symbol, so you must put a backslash (\) before it.
The regular expression for the full date should then look like this:
((|0)[19])|([1|2][09])|(30)|(31)\.((|0)[19])|(10)|(11)|(12)\.((19)[09][09])|([09][0
9])|([20][09][09]|([09][09])
Regular expression for email addresses: