User guide

NumPy User Guide, Release 1.9.0

>>> data = "1, , 3\n 4, 5, 6"

>>> convert = lambda x: float(x.strip() or -999)

>>> np.genfromtxt(StringIO(data), delimiter=",",

... converter={1: convert})

array([[ 1., -999., 3.],

[ 4., 5., 6.]])

Using missing and ﬁlling values

Some entries may be missing in the dataset we are trying to import. In a previous example, we used a converter to

transform an empty string into a ﬂoat. However, user-deﬁned converters may rapidly become cumbersome to manage.

The genfromtxt function provides two other complementary mechanisms: the missing_values argument is

used to recognize missing data and a second argument, filling_values, is used to process these missing data.

missing_values

By default, any empty string is marked as missing. We can also consider more complex strings, such as "N/A" or

"???" to represent missing or invalid data. The missing_values argument accepts three kind of values:

a string or a comma-separated string

This string will be used as the marker for missing data for all the columns

a sequence of strings

In that case, each item is associated to a column, in order.

a dictionary

Values of the dictionary are strings or sequence of strings. The corresponding keys can be column

indices (integers) or column names (strings). In addition, the special key None can be used to deﬁne

a default applicable to all columns.

filling_values

We know how to recognize missing data, but we still need to provide a value for these missing entries. By default, this

value is determined from the expected dtype according to this table:

Expected type Default

bool False

int -1

float np.nan

complex np.nan+0j

string ’???’

We can get a ﬁner control on the conversion of missing values with the filling_values optional argument. Like

missing_values, this argument accepts different kind of values:

a single value

This will be the default for all columns

a sequence of values

Each entry will be the default for the corresponding column

a dictionary

Each key can be a column index or a column name, and the corresponding value should be a single

object. We can use the special key None to deﬁne a default for all columns.

In the following example, we suppose that the missing values are ﬂagged with "N/A" in the ﬁrst column and by

"???" in the third column. We wish to transform these missing values to 0 if they occur in the ﬁrst and second

column, and to -999 if they occur in the last column:

2.3. I/O with Numpy 19