我有一些浮點數存儲在一個大的(500K x 500K)矩陣。我通過使用可變大小的數組(根據某些特定的條件)將它們存儲在塊中。h5py,零星的書寫錯誤
我有一個並行代碼(Python3.3和h5py),它生成數組並將它們放入共享隊列中,還有一個專用進程從隊列中彈出並將它們逐個寫入HDF5矩陣。它大約90%的時間按預期工作。
偶爾,我寫了特定數組的錯誤。如果我多次運行它,故障陣列總是變化。
下面的代碼:
def writer(in_q):
# Open HDF5 archive
hdf5_file = h5py.File("./google_matrix_test.hdf5")
hdf5_scores = hdf5_file['scores']
while True:
# Get some data
try:
data = in_q.get(timeout=5)
except:
hdf5_file.flush()
print('HDF5 archive updated.')
break
# Process the data
try:
hdf5_scores[data[0], data[1]:data[2]+1] = numpy.matrix(data[3:])
except:
# Print faulty chunk's info
print('E: ' + str(data[0:3]))
in_q.put(data) # <- doesn't solve
in_q.task_done()
def compute():
jobs_queue = JoinableQueue()
scores_queue = JoinableQueue()
processes = []
processes.append(Process(target=producer, args=(jobs_queue, data,)))
processes.append(Process(target=writer, args=(scores_queue,)))
for i in range(10):
processes.append(Process(target=consumer, args=(jobs_queue,scores_queue,)))
for p in processes:
p.start()
processes[1].join()
scores_queue.join()
這裏的錯誤:
Process Process-2:
Traceback (most recent call last):
File "/local/software/python3.3/lib/python3.3/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/local/software/python3.3/lib/python3.3/multiprocessing/process.py", line 95, in run
self._target(*self._args, **self._kwargs)
File "./compute_scores_multiprocess.py", line 104, in writer
hdf5_scores[data[0], data[1]:data[2]+1] = numpy.matrix(data[3:])
File "/local/software/python3.3/lib/python3.3/site-packages/h5py/_hl/dataset.py", line 551, in __setitem__
self.id.write(mspace, fspace, val, mtype)
File "h5d.pyx", line 217, in h5py.h5d.DatasetID.write (h5py/h5d.c:2925)
File "_proxy.pyx", line 120, in h5py._proxy.dset_rw (h5py/_proxy.c:1491)
File "_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite (h5py/_proxy.c:1301)
OSError: can't write data (Dataset: Write failed)
如果我插入兩秒鐘的停頓(time.sleep(2))的寫作任務中那麼問題似乎解決了(雖然我不能浪費2秒,因爲我需要寫超過250.000次)。如果我捕獲寫入異常並將錯誤數組放入隊列中,則腳本永遠不會停止(可能)。
我正在使用CentOS(2.6.32-279.11.1.el6.x86_64)。任何見解?
非常感謝。
感謝您的建議。我用Python代碼更新了我的請求,請你看看它嗎? – filannim
嗯...我假設生產者和消費者流程不以任何方式觸及HDF5? –
是的,這是正確的。我注意到,如果我恢復以前版本的hdf5存檔(以前的實驗中沒有使用過),一切都很好。這似乎與操作系統有關。它會是嗎? – filannim