用於以下R指數平均代碼的等效Python

下面的命令用於查找數組中特定x，y，z-索引的平均值。在我試圖解決的更大問題中，將2Gb文件讀入4D陣列中，其中前3個維度與空間（x，y，z）相關，並且第4維度是時間。自從編寫這個R腳本後，我已經去Python閱讀包含數據的2Gb文件，並且想將下面的R腳本的行轉換爲Python，所以我可以用一種語言來完成。有人知道這個等效的Python嗎？：用於以下R指數平均代碼的等效Python

# create a small example dataset for testing out script 
test_dat <- array(rnorm(10*10*4*50), dim=c(10,10,4,50)) 

# create a list of specific indices I want the average of (arbitrary 
# in this case, but not in the larger problem at hand) 
xyz_index <- list(c(2,10,1), c(4,5,1), c(6,7,1), c(9,3,1)) 

# bind the index data into a matrix for the next step 
m <- do.call(rbind,xyz_index) ## 4x3 matrix 

# will return the average of the values in test_dat that are 
# in the positions listed in xyz_index for each time index 
# (50 values in this small problem) 
sapply(seq(dim(test_dat)[4]), function(i) mean(test_dat[cbind(m,i)]))

來源

2014-09-22 user2256085

如果於從R遷移的唯一原因是內存的問題，然後解決這些問題，已經 - 你可以反覆做這個 - 不需要保留整個圖像在內存中。或者如果你想用Python來做，使用NumPy數組和/或PIL。你正試圖解決一個沒有問題的問題。 – smci 2014-09-22 23:42:02

smci，我沒有保留整個圖像在內存中，這是處理2Gb值的4d陣列信息的就業Python腳本的美麗。所以我可以留在一個環境中，我仍然想知道等效的Python是什麼R的後續位：'sapply（seq（dim（test_dat）[4]），function（i）mean（test_dat [cbind （m，i）]））' – user2256085 2014-09-23 15:54:55

**您不需要將它保存在R中的內存中。** – smci 2014-09-23 19:53:51

我想這是你想要的，請確認。原來連接多個切片在numpy中非常痛苦，顯然也在pandas。你真的不想編寫像these stride tricks這樣的晦澀難懂的代碼。

import numpy as np 

test_dat = np.random.randn(10,10,4,50) 

#xyz_index <- list(c(2,10,1), c(4,5,1), c(6,7,1), c(9,3,1))  
#m <- do.call(rbind,xyz_index) ## 4x3 matrix 
# I'm not sure about this, but it seems to get the 4x50 submatrix of concatenated slices 
# see https://stackoverflow.com/questions/21349133/numpy-array-integer-indexing-in-more-than-one-dimension 
m = np.r_[ '0,2', test_dat[2,9,1,:], test_dat[4,5,1,:], test_dat[6,7,1,:], test_dat[9,3,1,:] ] 

# Compute .mean() of values, sweep over t-axis 
#sapply(seq(dim(test_dat)[4]), function(i) mean(test_dat[cbind(m,i)])) 
m.mean(axis=1)

順便說一句，我也看着numpy masked array

來源

2014-09-23 20:53:50 smci

用於以下R指數平均代碼的等效Python

回答

相關問題