User guide
NumPy User Guide, Release 1.9.0
Splitting the lines into columns
The delimiter argument
Once the file is defined and open for reading, genfromtxt splits each non-empty line into a sequence of strings.
Empty or commented lines are just skipped. The delimiter keyword is used to define how the splitting should take
place.
Quite often, a single character marks the separation between columns. For example, comma-separated files (CSV) use
a comma (,) or a semicolon (;) as delimiter:
>>> data = "1, 2, 3\n4, 5, 6"
>>> np.genfromtxt(StringIO(data), delimiter=",")
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
Another common separator is "\t", the tabulation character. However, we are not limited to a single character, any
string will do. By default, genfromtxt assumes delimiter=None, meaning that the line is split along white
spaces (including tabs) and that consecutive white spaces are considered as a single white space.
Alternatively, we may be dealing with a fixed-width file, where columns are defined as a given number of characters.
In that case, we need to set delimiter to a single integer (if all the columns have the same size) or to a sequence of
integers (if columns can have different sizes):
>>> data = " 1 2 3\n 4 5 67\n890123 4"
>>> np.genfromtxt(StringIO(data), delimiter=3)
array([[ 1., 2., 3.],
[ 4., 5., 67.],
[ 890., 123., 4.]])
>>> data = "123456789\n 4 7 9\n 4567 9"
>>> np.genfromtxt(StringIO(data), delimiter=(4, 3, 2))
array([[ 1234., 567., 89.],
[ 4., 7., 9.],
[ 4., 567., 9.]])
The autostrip argument
By default, when a line is decomposed into a series of strings, the individual entries are not stripped of leading nor
trailing white spaces. This behavior can be overwritten by setting the optional argument autostrip to a value of
True:
>>> data = "1, abc , 2\n 3, xxx, 4"
>>> # Without autostrip
>>> np.genfromtxt(StringIO(data), dtype="|S5")
array([[’1’, ’ abc ’, ’ 2’],
[’3’, ’ xxx’, ’ 4’]],
dtype=’|S5’)
>>> # With autostrip
>>> np.genfromtxt(StringIO(data), dtype="|S5", autostrip=True)
array([[’1’, ’abc’, ’2’],
[’3’, ’xxx’, ’4’]],
dtype=’|S5’)
The comments argument
The optional argument comments is used to define a character string that marks the beginning of a comment. By
default, genfromtxt assumes comments=’#’. The comment marker may occur anywhere on the line. Any
character present after the comment marker(s) is simply ignored:
14 Chapter 2. Numpy basics