在Python中連續保存2D數組

我正在編寫一個程序，用於從sCMOS（科學CMOS）相機獲取數據。由於計劃是以高幀率採集，所以我希望在獲取時保存到磁盤，從而增加無記憶結束前可記錄的總時間。在Python中連續保存2D數組

有沒有辦法以二進制格式連續保存到同一個文件？理想情況下不包括每幀創建一個文件的選項。

2017-02-09 Aquiles Carattino

經過一段時間的修補後，我發現使用多線程模塊解決了這個問題。這個想法是有兩個進程運行，主要獲取數據和工作人員連續保存到磁盤。爲了實現它，你需要定義一個Queue，它將在進程之間以安全的方式共享數據。一旦保存了一幀，它就會釋放內存。 重要的是使用多處理而不是線程。多進程確實將進程分成不同的Python解釋器。線程使用相同的解釋器。因此，如果你的一個進程佔用了運行腳本的核心的100％，那麼事情就會停止。在我的應用程序中，這是至關重要的，因爲它顯着改變了幀率。

當心：我使用h5py來保存HDF5格式的文件，但你可以很容易適應的代碼保存到一個文本文件平淡，使用numpy的，等

首先我定義了工人功能這將在稍後發送到不同的進程。輸入是保存數據的文件和數據的隊列。無限循環是因爲我沒有在我決定之前退出函數，即使隊列是空的。退出標誌只是一個傳遞給隊列的字符串。

import h5py 
from multiprocessing import Process, Queue 

def workerSaver(fileData,q): 
    """Function that can be run in a separate thread for continuously save data to disk. 
    fileData -- STRING with the path to the file to use. 
    q -- Queue that will store all the images to be saved to disk. 
    """ 
    f = h5py.File(fileData, "w") # This will overwrite the file. Be sure to supply a new file path. 

    allocate = 100 # Number of frames to allocate along the z-axis. 
    keep_saving = True # Flag that will stop the worker function if running in a separate thread. 
         # Has to be submitted via the queue a string 'exit' 
    i=0 
    while keep_saving: 
     while not q.empty(): 
      img = q.get() 
      if i == 0: # First time it runs, creates the dataset 
       x = img.shape[0] 
       y = img.shape[1] 
       dset = f.create_dataset('image', (x,y,allocate), maxshape=(x,y,None)) # The images are going to be stacked along the z-axis. 
                       # The shape along the z axis will be increased as the number of images increase. 
      if type(img)==type('exit'): 
       keep_saving = False 
      else: 
       if i == dset.shape[2]: 
        dset.resize(i+allocate,axis=2) 
       dset[:,:,i] = img 
       i+=1 
    f.close()

而現在，我們定義了工作者行爲的代碼的重要部分。

import numpy as np 
import time 
fileData = 'path-to-file.dat' 
# Queue of images. multiprocessing takes care of handling the data in and out 
# and the sharing between parent and child processes. 
q = Queue(0) 
# Child process to save the data. It runs continuously until an exit flag 
# is passed through the Queue. (q.put('exit')) 
p = Process(target=workerSaver,args=(fileData,q,)) 
p.start() 
example_image = np.ones((50,50)) 
for i in range(10000): 
    q.put(example_image) 
    print(q.qsize()) 
    time.sleep(0.01) # Sleep 10ms 

q.put('Exit') # Any string would work 
p.join()

檢查過程p開始，我們開始填充隊列q之前將運行。當然，有更聰明的方法來存儲數據（例如以塊爲單位而不是每個圖像），但是我已經檢查並且磁盤處於完整的寫入速度，所以我不確定是否有改進。準確地知道我們要保存的數據類型也有助於加快速度，特別是使用HDF5（與32位存儲的8位整數不同）

來源

2017-02-16 14:46:32

在Python中連續保存2D數組

回答

相關問題