18/03: reading and writing data in numpy

Tags: numpy

I've been using functions in the scipy.io module to read and write binary and ASCII data, specifically the fread(), fwrite() and write_array() functions. These have been deprecated due to some new functionality in numpy. I'm not sure how bleeding-edge this is, since I have been pretty erratic about installing numpy/scipy from svn and from official releases, but since there's no easily locatable documentation on the web at present, here are my notes. In each example, fp represents a file object (opened in the correct mode, of course), and the first line is the old scipy call that should be replaced with the new numpy call in the second line.

Writing binary data:



scipy.io.fwrite(fp, data.size, data.astype('int16').squeeze())

data.astype('int16').squeeze().tofile(fp, sep="")

Writing ASCII (i.e. tabular) data. I do this all the time when exporting large data sets to R. Write a header
line to the file first, and adjust the format string for your data.



scipy.io.write_array(fp, data.astype('i'))

numpy.savetxt(fp, data, fmt="%i")

Reading binary data (PCM sound data in this case):



data = scipy.io.fread(fp, frames, 'h', 'h')

data = numpy.fromfile(fp, dtype='h', count=frames)

An alternative is to use numpy's builtin memory-mapped file access, described here. This method supports complex and structured data types. For instance, I could replace my function for reading PCM data with the following line:



data = numpy.memmap("data.pcm", dtype='h', mode='r')

Python should handle the garbage collection on the object (data) when its reference count drops to zero, but I haven't tested this extensively. And of course you have to be careful about writing to the array, because it will affect what's on the disk. Setting the object to read-only mode (as above) will prevent this from happening, but a RuntimeError will be thrown if you try to modify the array.

18/03: reading and writing data in numpy

Comments