User guide

NumPy User Guide, Release 1.9.0
Examples of formats that cannot be read directly but for which it is not hard to convert are those formats supported by
libraries like PIL (able to read and write many image formats such as jpg, png, etc).
Common ASCII Formats
Comma Separated Value files (CSV) are widely used (and an export and import option for programs like Excel). There
are a number of ways of reading these files in Python. There are CSV functions in Python and functions in pylab (part
of matplotlib).
More generic ascii files can be read using the io package in scipy.
Custom Binary Formats
There are a variety of approaches one can use. If the file has a relatively simple format then one can write a simple
I/O library and use the numpy fromfile() function and .tofile() method to read and write numpy arrays directly (mind
your byteorder though!) If a good C or C++ library exists that read the data, one can wrap that library with a variety of
techniques though that certainly is much more work and requires significantly more advanced knowledge to interface
with C or C++.
Use of Special Libraries
There are libraries that can be used to generate arrays for special purposes and it isn’t possible to enumerate all of
them. The most common uses are use of the many array generation functions in random that can generate arrays of
random values, and some utility functions to generate special matrices (e.g. diagonal).
2.3 I/O with Numpy
2.3.1 Importing data with genfromtxt
Numpy provides several functions to create arrays from tabular data. We focus here on the genfromtxt function.
In a nutshell, genfromtxt runs two main loops. The first loop converts each line of the file in a sequence of strings.
The second loop converts each string to the appropriate data type. This mechanism is slower than a single loop, but
gives more flexibility. In particular, genfromtxt is able to take missing data into account, when other faster and
simpler functions like loadtxt cannot.
Note: When giving examples, we will use the following conventions:
>>> import numpy as np
>>> from StringIO import StringIO
Defining the input
The only mandatory argument of genfromtxt is the source of the data. It can be a string corresponding to the name
of a local or remote file, or a file-like object with a read method (such as an actual file or a StringIO.StringIO
object). If the argument is the URL of a remote file, this latter is automatically downloaded in the current directory.
The input file can be a text file or an archive. Currently, the function recognizes gzip and bz2 (bzip2) archives. The
type of the archive is determined by examining the extension of the file: if the filename ends with ’.gz’, a gzip
archive is expected; if it ends with ’bz2’, a bzip2 archive is assumed.
2.3. I/O with Numpy 13