numpy memmap修改文件

我在理解numpy.memmap的工作方式時遇到問題。背景是我需要通過刪除條目來減少保存在光盤上的大型numpy陣列。讀數組並通過複製所需的部分來建立一個新的部分不起作用 - 它只是不適合內存。所以想法是使用numpy.memmap - 即在光盤上工作。她是我的代碼（具有很小的文件）：numpy memmap修改文件

import numpy 

in_file = './in.npy' 
in_len = 10 
out_file = './out.npy' 
out_len = 5 

# Set up input dummy-file 
dummy_in = numpy.zeros(shape=(in_len,1),dtype=numpy.dtype('uint32')) 
for i in range(in_len): 
    dummy_in[i] = i + i 
numpy.save(in_file, dummy_in) 

# get dtype and shape from the in_file 
in_npy = numpy.load(in_file) 

in_dtype = in_npy.dtype 
in_shape = (in_npy.shape[0],1) 
del(in_npy) 

# generate an 'empty' out_file with the desired dtype and shape 
out_shape = (out_len,1) 
out_npy = numpy.zeros(shape=out_shape, dtype=in_dtype) 
numpy.save(out_file, out_npy) 
del(out_npy) 

# memmap both files 
in_memmap = numpy.memmap(in_file, mode='r', shape=in_shape, dtype=in_dtype) 
out_memmap = numpy.memmap(out_file, mode='r+', shape=out_shape, dtype=in_dtype) 
print "in_memmap" 
print in_memmap, "\n" 
print "out_memmap before in_memmap copy" 
print out_memmap, "\n" 

# copy some parts 
for i in range(out_len): 
    out_memmap[i] = in_memmap[i] 

print "out_memmap after in_memmap copy" 
print out_memmap, "\n" 
out_memmap.flush() 

# test 
in_data = numpy.load(in_file) 
print "in.npy" 
print in_data 
print in_data.dtype, "\n" 

out_data = numpy.load(out_file) 
print "out.npy" 
print out_data 
print out_data.dtype, "\n"

運行這段代碼中，我得到：

in_memmap 
[[1297436307] 
[  88400] 
[ 662372422] 
[1668506980] 
[ 540682098] 
[ 880098343] 
[ 656419879] 
[1953656678] 
[1601069426] 
[1701081711]] 

out_memmap before in_memmap copy 
[[1297436307] 
[  88400] 
[ 662372422] 
[1668506980] 
[ 540682098]] 

out_memmap after in_memmap copy 
[[1297436307] 
[  88400] 
[ 662372422] 
[1668506980] 
[ 540682098]] 

in.npy 
[[ 0] 
[ 2] 
[ 4] 
[ 6] 
[ 8] 
[10] 
[12] 
[14] 
[16] 
[18]] 
uint32 

out.npy 
[[0] 
[0] 
[0] 
[0] 
[0]] 
uint32

形成輸出很顯然，我做錯了什麼：

1 ）memmap不包含在數組中設置的值，並且in_memmap和out_memmap包含相同的值。

2）複製命令是否複製了從in_memmap到out_memmap（由於相同的值），所以不清楚。在調試模式下檢查in_memmap[i]和out_memmap[i]的值我得到兩個：memmap([1297436307], dtype=uint32)。那麼我可以如代碼中那樣分配它們，還是必須使用：out_memmap[i][0] = in_memmap[i][0]？

3）out.npy不是由flush()操作更新爲out_memmap值。

任何人都可以請幫助我瞭解我在這裏做錯了什麼。

非常感謝

來源

2017-08-08 fdiehl

你的問題似乎是'np.save'和'np.memmap'有稍微不同的格式。檢查[this]（https://stackoverflow.com/questions/23062674/numpy-memmap-map-to-save-file）回答出 –

另外，如果您經常使用比RAM更大的陣列，請檢查[DASK]（https://dask.pydata.org/en/latest/） –

更換的np.memmap每個實例有np.lib.format.open_memmap並獲得：

in_memmap 
[[ 0] 
[ 2] 
[ 4] 
[ 6] 
[ 8] 
[10] 
[12] 
[14] 
[16] 
[18]] 

out_memmap before in_memmap copy 
[[0] 
[0] 
[0] 
[0] 
[0]] 

out_memmap after in_memmap copy 
[[0] 
[2] 
[4] 
[6] 
[8]] 

in.npy 
[[ 0] 
[ 2] 
[ 4] 
[ 6] 
[ 8] 
[10] 
[12] 
[14] 
[16] 
[18]] 
uint32 

out.npy 
[[0] 
[2] 
[4] 
[6] 
[8]] 
uint32

np.save增加了報頭np.memmap在讀，這就是爲什麼在這兩個數據看起來都一樣（因爲它是相同的標題）。這也是爲什麼當你將數據從一個數據複製到另一個數據時，它不起作用（因爲它只是複製標題，而不是數據），因此可以自動跳過標題，以便處理數據。

來源

2017-08-08 13:02:29

numpy memmap修改文件

回答

相關問題