Tools.h++ Manual

104011 Tandem Computers Incorporated 5-9
5
Program output:
5
This program counts the number of tokens in the string. The function call
operator for class
RWCTokenizer
has been overloaded to mean "advance to
the next token and return it as an
RWCSubString
", much like any other
iterator. When there are no more tokens, it returns the null substring. Class
RWCSubString
has a member function
isNull()
which returns
TRUE
if the
substring is the null substring. Hence, the loop is broken.
5.7 Multibyte strings
Class
RWCString
provides limited support for multibyte strings. Because a
multibyte character can consist of two more more bytes, the length of a string
in bytes may be greater than or equal to the number of actual characters in the
string. If the
RWCString
may contain multibyte characters, then you should
use member function
mbLength()
to return the number of characters. On the
other hand, if you know that the
RWCString
does not contain any multibyte
characters, then the results of
length()
and
mbLength()
will be the same,
and you may want to use
length()
because it is much faster. Here’s an
example:
RWCString Sun(“\306\374\315\313\306\374”);
cout << Sun.length(); // Prints “6”
cout << Sun.mbLength(); // Prints “3”
The string in
Sun
is the day of the week Sunday in Kanji, using the EUC
(Extended Unix Code) multibyte code set. With EUC, a single character may
be one to four bytes long. In this example, the string
Sun
consists of 6 bytes,
but only 3 characters.
In general, the second or later byte of a multibyte character may be null. This
means the length in bytes of a character string may or may not match the
length given by
strlen()
. Internally,
RWCString
makes no assumptions
about embedded nulls and hence can be used safely with character sets that
cout << i << endl;
return 0;
}
#include <rw/ctoken.h>