User guide

NumPy User Guide, Release 1.9.0

>>> data = "1 2 3\n4 5 6"

>>> np.genfromtxt(StringIO(data),

... names="a, b, c", usecols=("a", "c"))

array([(1.0, 3.0), (4.0, 6.0)],

dtype=[(’a’, ’<f8’), (’c’, ’<f8’)])

>>> np.genfromtxt(StringIO(data),

... names="a, b, c", usecols=("a, c"))

array([(1.0, 3.0), (4.0, 6.0)],

dtype=[(’a’, ’<f8’), (’c’, ’<f8’)])

Choosing the data type

The main way to control how the sequences of strings we have read from the ﬁle are converted to other types is to set

the dtype argument. Acceptable values for this argument are:

• a single type, such as dtype=float. The output will be 2D with the given dtype, unless a name has been

associated with each column with the use of the names argument (see below). Note that dtype=float is the

default for genfromtxt.

• a sequence of types, such as dtype=(int, float, float).

• a comma-separated string, such as dtype="i4,f8,|S3".

• a dictionary with two keys ’names’ and ’formats’.

• a sequence of tuples (name, type), such as dtype=[(’A’, int), (’B’, float)].

• an existing numpy.dtype object.

• the special value None. In that case, the type of the columns will be determined from the data itself (see below).

In all the cases but the ﬁrst one, the output will be a 1D array with a structured dtype. This dtype has as many ﬁelds as

items in the sequence. The ﬁeld names are deﬁned with the names keyword.

When dtype=None, the type of each column is determined iteratively from its data. We start by checking whether a

string can be converted to a boolean (that is, if the string matches true or false in lower cases); then whether it can

be converted to an integer, then to a ﬂoat, then to a complex and eventually to a string. This behavior may be changed

by modifying the default mapper of the StringConverter class.

The option dtype=None is provided for convenience. However, it is signiﬁcantly slower than setting the dtype

explicitly.

Setting the names

The names argument

A natural approach when dealing with tabular data is to allocate a name to each column. A ﬁrst possibility is to use an

explicit structured dtype, as mentioned previously:

>>> data = StringIO("1 2 3\n 4 5 6")

>>> np.genfromtxt(data, dtype=[(_, int) for _ in "abc"])

array([(1, 2, 3), (4, 5, 6)],

dtype=[(’a’, ’<i8’), (’b’, ’<i8’), (’c’, ’<i8’)])

Another simpler possibility is to use the names keyword with a sequence of strings or a comma-separated string:

>>> data = StringIO("1 2 3\n 4 5 6")

>>> np.genfromtxt(data, names="A, B, C")

16 Chapter 2. Numpy basics