2013-10-11 104 views
1

追加例如,我們有基質(例如我們想要存儲numpy的陣列)和我們將它存儲在HDF5文件,但隨後我們希望通過附加一些行到原始矩陣的端部延伸矩陣(坐考慮到原始矩陣可能非常大〜幾十Gb,並且不能加載到RAM)HDF5矩陣在python

此外,我們希望能夠從任意點讀取矩陣中的少數幾行(也許稱爲slice(?) )而無需在RAM中加載整個矩陣。

任何人都可以提供一個例子如何能夠在Python做呢?

UPDATE:

我認爲另一個選擇是numpy.memmap,但似乎沒有追加。

This似乎也是一種選擇,但它使用原始二進制數據,但我想訪問矩陣。此外,我不知道如何做append。

回答

0

如果你要與HDF5文件中工作,那麼我可以建議你使用現有的庫,例如Pytables之一。我在這裏發佈並簡化了他們的教程:http://pytables.github.io/usersguide/tutorials.html

from tables import * 

# Define a user record to characterize some kind of particles 
class Particle(IsDescription): 
    name  = StringCol(16) # 16-character String 
    idnumber = Int64Col()  # Signed 64-bit integer 
    ADCcount = UInt16Col()  # Unsigned short integer 
    TDCcount = UInt8Col()  # unsigned byte 
    grid_i = Int32Col()  # integer 
    grid_j = Int32Col()  # integer 
    pressure = Float32Col() # float (single-precision) 
    energy = FloatCol()  # double (double-precision) 

filename = "test.h5" 
# Open a file in "w"rite mode 
h5file = openFile(filename, mode = "w", title = "Test file") 
# Create a new group under "/" (root) 
group = h5file.createGroup("/", 'detector', 'Detector information') 
# Create one table on it 
table = h5file.createTable(group, 'readout', Particle, "Readout example") 
# Fill the table with 10 particles 
particle = table.row 
for i in xrange(10): 
    particle['name'] = 'Particle: %6d' % (i) 
    particle['TDCcount'] = i % 256 
    particle['ADCcount'] = (i * 256) % (1 << 16) 
    particle['grid_i'] = i 
    particle['grid_j'] = 10 - i 
    particle['pressure'] = float(i*i) 
    particle['energy'] = float(particle['pressure'] ** 4) 
    particle['idnumber'] = i * (2 ** 34) 
    # Insert a new particle record 
    particle.append() 
# Close (and flush) the file 
h5file.close() 

#now we will append some data to the table, after taking some slices 
f=tables.openFile(filename, mode="a") 
f.root.detector 
f.root.detector.readout 
f.root.detector.readout[1::3] 
f.root.detector.readout.attrs.TITLE 
ro = f.root.detector.readout 

#generators work 
[row['energy'] for row in ro.where('pressure > 10')] 


#append some data 
table = f.root.detector.readout 
particle = table.row 
for i in xrange(10, 15): 
    particle['name'] = 'Particle: %6d' % (i) 
    particle['TDCcount'] = i % 256 
    particle['ADCcount'] = (i * 256) % (1 << 16) 
    particle['grid_i'] = i 
    particle['grid_j'] = 10 - i 
    particle['pressure'] = float(i*i) 
    particle['energy'] = float(particle['pressure'] ** 4) 
    particle['idnumber'] = i * (2 ** 34) 
    particle.append() 
table.flush() 
f.close()