Numpy supports
structured arrays, which are the nearest thing to R's data.frame class. Data are organized into fields and records. Each field (column) has a name and data type, and each record (row) has a value for all the fields. Columns are indexed by name, and rows are indexed by integers. Recarray objects can be generated from nested Python iterable objects using numpy.rec.fromrecords:
>>> D = [('fair',6.0,1), ('good',12,2)]
>>> D = numpy.rec.fromrecords(D, names='quality,price,size')
>>> D
rec.array([('fair', 6.0, 1), ('good', 12.0, 2)],
dtype=[('quality', '|S4'), ('price', '
>>> D['quality']
rec.array(['fair', 'good'],
dtype='|S4')
>> D[0]
('fair', 6, 1)
Note that the 'price' field has a float data type because one of the records has a float value, and the field is promoted to the most general data type. For more precise control over field data types, fromrecords() takes a format argument, which is a comma-delimited list of format strings. For instance, to force 'price' to be an integer, call
D = numpy.rec.fromrecords(D, names='quality,price,size', formats='S4,i4,i4')
For reading and writing recarrays, use matplotlib.mlab.rec2csv() and matplotlib.mlab.csv2rec(). The format of each field can be specified using a dictionary. There are a number of arguments to both functions that can be used to control how the data is read in (e.g. delimiter, is the first row a list of field names, etc), most of which are documented. The rec2csv() function always outputs field names as headers. To avoid this behavior, or to avoid having a dependency on matplotlib, use numpy.savetxt()
>>> from matplotlib import mlab
>>> formatd = {'quality' : mlab.FormatString(), 'price' : mlab.FormatFloat(2),}
>>> mlab.rec2csv(D, 'test.csv', formatd=formatd)
>>> mlab.csv2rec('test.csv')
rec.array([('fair', 6.0, 1), ('good', 12.0, 2)],
dtype=[('quality', '|S4'), ('price', '
>>> numpy.savetxt('test.csv', D, delimiter=',', fmt=('%s','%3.2f','%d'))
>>> numpy.loadtxt('test.csv', delimiter=',', dtype={'names': ('quality','price','size'), 'formats' : ('S4', 'f8', 'i4')})
array([('fair', 6.0, 1), ('good', 12.0, 2)],
dtype=[('quality', '|S4'), ('price', '
ep wrote:
is mlab the fastest read-in fcn around?