User guide

NumPy User Guide, Release 1.9.0
>>> data = "1 2 3\n4 5 6"
>>> np.genfromtxt(StringIO(data),
... names="a, b, c", usecols=("a", "c"))
array([(1.0, 3.0), (4.0, 6.0)],
dtype=[(’a’, ’<f8’), (’c’, ’<f8’)])
>>> np.genfromtxt(StringIO(data),
... names="a, b, c", usecols=("a, c"))
array([(1.0, 3.0), (4.0, 6.0)],
dtype=[(’a’, ’<f8’), (’c’, ’<f8’)])
Choosing the data type
The main way to control how the sequences of strings we have read from the file are converted to other types is to set
the dtype argument. Acceptable values for this argument are:
a single type, such as dtype=float. The output will be 2D with the given dtype, unless a name has been
associated with each column with the use of the names argument (see below). Note that dtype=float is the
default for genfromtxt.
a sequence of types, such as dtype=(int, float, float).
a comma-separated string, such as dtype="i4,f8,|S3".
a dictionary with two keys ’names’ and ’formats’.
a sequence of tuples (name, type), such as dtype=[(’A’, int), (’B’, float)].
an existing numpy.dtype object.
the special value None. In that case, the type of the columns will be determined from the data itself (see below).
In all the cases but the first one, the output will be a 1D array with a structured dtype. This dtype has as many fields as
items in the sequence. The field names are defined with the names keyword.
When dtype=None, the type of each column is determined iteratively from its data. We start by checking whether a
string can be converted to a boolean (that is, if the string matches true or false in lower cases); then whether it can
be converted to an integer, then to a float, then to a complex and eventually to a string. This behavior may be changed
by modifying the default mapper of the StringConverter class.
The option dtype=None is provided for convenience. However, it is significantly slower than setting the dtype
explicitly.
Setting the names
The names argument
A natural approach when dealing with tabular data is to allocate a name to each column. A first possibility is to use an
explicit structured dtype, as mentioned previously:
>>> data = StringIO("1 2 3\n 4 5 6")
>>> np.genfromtxt(data, dtype=[(_, int) for _ in "abc"])
array([(1, 2, 3), (4, 5, 6)],
dtype=[(’a’, ’<i8’), (’b’, ’<i8’), (’c’, ’<i8’)])
Another simpler possibility is to use the names keyword with a sequence of strings or a comma-separated string:
>>> data = StringIO("1 2 3\n 4 5 6")
>>> np.genfromtxt(data, names="A, B, C")
16 Chapter 2. Numpy basics