調整numpy.memmap數組的大小

我正在處理一堆大的numpy數組，並且由於最近這些數組開始咀嚼太多內存，我想用numpy.memmap實例替換它們。問題是，現在，我不得不調整數組的大小，我最好做到這一點。這對普通數組來說效果很好，但是在memmaps上試着抱怨，可能會共享數據，甚至禁用refcheck也無濟於事。調整numpy.memmap數組的大小

a = np.arange(10) 
a.resize(20) 
a 
>>> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 

a = np.memmap('bla.bin', dtype=int) 
a 
>>> memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 

a.resize(20, refcheck=False) 
--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-41-f1546111a7a1> in <module>() 
----> 1 a.resize(20, refcheck=False) 

ValueError: cannot resize this array: it does not own its data

調整底層mmap緩衝區的大小非常好。問題是如何將這些更改反映到數組對象。我已經看到這個workaround，但不幸的是它沒有調整陣列的大小。還有一些關於調整mmap大小的numpy documentation，但它顯然不起作用，至少在1.8.0版本中是這樣。任何其他想法，如何覆蓋內置的大小調整檢查？

來源

2014-01-05 Michael

我覺得我必須失去了一些東西......這個代碼運行對我罰款。它會爲你運行嗎？這不是你想要做的嗎？ http://codepad.org/eEWmYBHZ –

@three_pineapples他想改變數組的總大小 - 你的代碼只是重塑它的形狀 –

@ali_m啊，我明白了。我沒有從這個問題中得到答案，但正如我所說，我以爲我錯過了一些東西！感謝澄清 –

問題是當您創建陣列時，標誌OWNDATA爲False。您可以更改通過要求標誌爲True，當你創建數組：

>>> a = np.require(np.memmap('bla.bin', dtype=int), requirements=['O']) 
>>> a.shape 
(10,) 
>>> a.flags 
    C_CONTIGUOUS : True 
    F_CONTIGUOUS : True 
    OWNDATA : True 
    WRITEABLE : True 
    ALIGNED : True 
    UPDATEIFCOPY : False 
>>> a.resize(20, refcheck=False) 
>>> a.shape 
(20,)

唯一需要注意的是，它可以創建數組，然後複印一份，以確保滿足要求。

編輯，以解決節能：

如果你想保存重新調整大小陣列磁盤，您可以在MEMMAP作爲.npy格式的文件，並打開保存爲numpy.memmap當你需要重新打開它，作爲一個MEMMAP使用：

>>> a[9] = 1 
>>> np.save('bla.npy',a) 
>>> b = np.lib.format.open_memmap('bla.npy', dtype=int, mode='r+') 
>>> b 
memmap([0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

編輯提供了另一種方法：

你可以親近你被重新調整大小尋找什麼底座MMAP（a.base或a._mmap，存儲在UINT8格式）和「重裝」的MEMMAP：

>>> a = np.memmap('bla.bin', dtype=int) 
>>> a 
memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 
>>> a[3] = 7 
>>> a 
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0]) 
>>> a.flush() 
>>> a = np.memmap('bla.bin', dtype=int) 
>>> a 
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0]) 
>>> a.base.resize(20*8) 
>>> a.flush() 
>>> a = np.memmap('bla.bin', dtype=int) 
>>> a 
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

來源

2014-01-09 05:19:09 wwwslinger

有趣。不幸的是，對我來說，它總是在內存中創建一個副本。如果我嘗試寫入數組，刷新，刪除和重新打開數組，它將像以前一樣再次爲空。所以我猜這些數據永遠不會寫入磁盤。 – Michael

我添加了一個例子，說明如何將它保存並在稍後作爲memmap重新打開 – wwwslinger

@wwwslinger答案的問題是，如果'a'太大而無法放入核心內存中（爲什麼還要使用內存 - 映射數組？），然後在內核中創建它的另一個副本顯然會導致一些問題。你最好從頭開始創建一個正確大小的新內存映射數組，然後用'a'的內容填充它。 –

如果我沒有記錯，這基本上達到什麼@ wwwslinger的第二個解決方案的做法，但不無需手動指定新MEMMAP的位大小：

In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,)) 

In [2]: a[3] = 7 

In [3]: a 
Out[3]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0]) 

In [4]: a.flush() 

# this will append to the original file as much as is necessary to satisfy 
# the new shape requirement, given the specified dtype 
In [5]: new_a = np.memmap('bla.bin', mode='r+', dtype=int, shape=(20,)) 

In [6]: new_a 
Out[6]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) 

In [7]: a[-1] = 10 

In [8]: a 
Out[8]: memmap([ 0, 0, 0, 7, 0, 0, 0, 0, 0, 10]) 

In [9]: a.flush() 

In [11]: new_a 
Out[11]: 
memmap([ 0, 0, 0, 7, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0])

這種運作良好，當新的陣列需要比舊的大，但我不認爲這種類型的方法將允許如果新數組較小，內存映射文件的大小將自動截斷。

像在@ wwwslinger的答案中一樣，手動調整基地的大小似乎允許文件被截斷，但它不會減小數組的大小。

例如：

# this creates a memory mapped file of 10 * 8 = 80 bytes 
In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,)) 

In [2]: a[:] = range(1, 11) 

In [3]: a.flush() 

In [4]: a 
Out[4]: memmap([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) 

# now truncate the file to 40 bytes 
In [5]: a.base.resize(5*8) 

In [6]: a.flush() 

# the array still has the same shape, but the truncated part is all zeros 
In [7]: a 
Out[7]: memmap([1, 2, 3, 4, 5, 0, 0, 0, 0, 0]) 

In [8]: b = np.memmap('bla.bin', mode='r+', dtype=int, shape=(5,)) 

# you still need to create a new np.memmap to change the size of the array 
In [9]: b 
Out[9]: memmap([1, 2, 3, 4, 5])

來源

2014-01-14 10:08:52

這是一種類似於我發佈的解決方法中的方法。我更喜歡就地解決方案，因爲它會讓我無法進一步封裝對象。無論如何，這可能是我最終必須忍受的。 – Michael

@Michael如果你還沒有，你應該向numpy的維護者報告這個問題。至少應該更新'np.memmap'類的文檔字符串，以反映當前不可能調整內存映射數組的大小。 –

我沒有，但因爲看起來沒有簡單的解決方案，我會的。 – Michael

調整numpy.memmap數組的大小

回答

相關問題