User guide
NumPy User Guide, Release 1.9.0
deletechars
Gives a string combining all the characters that must be deleted from the name. By default, invalid
characters are ~!@#$%^&
*
()-=+~\|]}[{’;: /?.>,<.
excludelist
Gives a list of the names to exclude, such as return, file , print... If one of the input name is
part of this list, an underscore character (’_’) will be appended to it.
case_sensitive
Whether the names should be case-sensitive (case_sensitive=True), converted to up-
per case (case_sensitive=False or case_sensitive=’upper’) or to lower case
(case_sensitive=’lower’).
Tweaking the conversion
The converters argument
Usually, defining a dtype is sufficient to define how the sequence of strings must be converted. However, some
additional control may sometimes be required. For example, we may want to make sure that a date in a format
YYYY/MM/DD is converted to a datetime object, or that a string like xx% is properly converted to a float between
0 and 1. In such cases, we should define conversion functions with the converters arguments.
The value of this argument is typically a dictionary with column indices or column names as keys and a conversion
functions as values. These conversion functions can either be actual functions or lambda functions. In any case, they
should accept only a string as input and output only a single element of the wanted type.
In the following example, the second column is converted from as string representing a percentage to a float between
0 and 1:
>>> convertfunc = lambda x: float(x.strip("%"))/100.
>>> data = "1, 2.3%, 45.\n6, 78.9%, 0"
>>> names = ("i", "p", "n")
>>> # General case .....
>>> np.genfromtxt(StringIO(data), delimiter=",", names=names)
array([(1.0, nan, 45.0), (6.0, nan, 0.0)],
dtype=[(’i’, ’<f8’), (’p’, ’<f8’), (’n’, ’<f8’)])
We need to keep in mind that by default, dtype=float. A float is therefore expected for the second column.
However, the strings ’ 2.3%’ and ’ 78.9%’ cannot be converted to float and we end up having np.nan instead.
Let’s now use a converter:
>>> # Converted case ...
>>> np.genfromtxt(StringIO(data), delimiter=",", names=names,
... converters={1: convertfunc})
array([(1.0, 0.023, 45.0), (6.0, 0.78900000000000003, 0.0)],
dtype=[(’i’, ’<f8’), (’p’, ’<f8’), (’n’, ’<f8’)])
The same results can be obtained by using the name of the second column ("p") as key instead of its index (1):
>>> # Using a name for the converter ...
>>> np.genfromtxt(StringIO(data), delimiter=",", names=names,
... converters={"p": convertfunc})
array([(1.0, 0.023, 45.0), (6.0, 0.78900000000000003, 0.0)],
dtype=[(’i’, ’<f8’), (’p’, ’<f8’), (’n’, ’<f8’)])
Converters can also be used to provide a default for missing entries. In the following example, the converter convert
transforms a stripped string into the corresponding float or into -999 if the string is empty. We need to explicitly strip
the string from white spaces as it is not done by default:
18 Chapter 2. Numpy basics