h5py to replace npz?!

Apparently .npz files have a limit of ~4GB, which can be a problem when dealing with large datasets ($$600^3$$), so how do I write binary files of this size?! Looks like I need to migrate to HDF5…lol; here’s an example for read/write similar to npz:

import h5py
a = np.random.rand(1000,1000)
f = h5py.File('/tmp/myfile.hdf5')
f['a'] = a # <-- Save
f.close()

now that will write the file, lets open it

import h5py
f = h5py.File('/tmp/myfile.hdf5')
b = f['a']
#b = b[()]
f.keys() #shows all values in the dataset
f.close()

but the odd thing is, you can’t directly access the data here by just ‘b’. Instead there is a ton of information attached like b.shape, b.dtype, etc etc, all found here. Instead, you simply have to use the command

b.value

and that will output the data of the file.

Robert Thompson

h5py to replace npz?!

Comments