從一個大的文件閱讀，而無需使用h5py

加載整個事情到內存中執行以下操作，從數據集閱讀而不用一次加載整個事情到內存[整個事情不會裝入內存]，並獲得大小數據集沒有加載數據在python中使用h5py？如果沒有，如何？從一個大的文件閱讀，而無需使用h5py

h5 = h5py.File('myfile.h5', 'r') 
mydata = h5.get('matirx') # are all data loaded into memory by using h5.get? 
part_of_mydata= mydata[1000:11000,:] 
size_data = mydata.shape

謝謝。

來源

2017-01-31 superMind

get（或索引）獲取文件上的數據集的引用，但不加載任何數據。

In [789]: list(f.keys()) 
Out[789]: ['dset', 'dset1', 'vset'] 
In [790]: d=f['dset1'] 
In [791]: d 
Out[791]: <HDF5 dataset "dset1": shape (2, 3, 10), type "<f8"> 
In [792]: d.shape   # shape of dataset 
Out[792]: (2, 3, 10) 
In [793]: arr=d[:,:,:5] # indexing the set fetches part of the data 
In [794]: arr.shape 
Out[794]: (2, 3, 5) 
In [795]: type(d) 
Out[795]: h5py._hl.dataset.Dataset 
In [796]: type(arr) 
Out[796]: numpy.ndarray

d數據集是陣列等，但實際上不是numpy陣列。

獲取整個數據集：

In [798]: arr = d[:] 
In [799]: type(arr) 
Out[799]: numpy.ndarray

究竟怎麼有閱讀獲取yourslice取決於切片，數據佈局，分塊的文件，以及其他的東西通常不是你的控制之下，而且不應該擔心你。

請注意，當讀取一個數據集時，我不會加載其他數據集。這同樣適用於團體。

http://docs.h5py.org/en/latest/high/dataset.html#reading-writing-data

來源

2017-01-31 04:42:42 hpaulj

從一個大的文件閱讀，而無需使用h5py

回答

相關問題