的Python和NumPy的 - 創造ndarray

動態的，任意的子集，我找了一個通用的方式來做到這一點：的Python和NumPy的 - 創造ndarray

raw_data = np.array(somedata) 
filterColumn1 = raw_data[:,1] 
filterColumn2 = raw_data[:,3] 
cartesian_product = itertools.product(np.unique(filterColumn1), np.unique(filterColumn2)) 
for val1, val2 in cartesian_product: 
    fixed_mask = (filterColumn1 == val1) & (filterColumn2 == val2) 
    subset = raw_data[fixed_mask]

我希望能夠使用filterColumns的任何量。所以我想要的是：

filterColumns = [filterColumn1, filterColumn2, ...] 
uniqueValues = map(np.unique, filterColumns) 
cartesian_product = itertools.product(*uniqueValues) 
for combination in cartesian_product: 
    variable_mask = ???? 
    subset = raw_data[variable_mask]

是否有一個簡單的語法來做我想要的？否則，我應該嘗試一種不同的方法嗎？

編輯：這似乎是工作

cartesian_product = itertools.product(*uniqueValues) 
for combination in cartesian_product: 

    variable_mask = True 
    for idx, fc in enumerate(filterColumns): 
     variable_mask &= (fc == combination[idx]) 

    subset = raw_data[variable_mask]

來源

2014-10-03 Joe Bashe

你可以使用numpy.all和索引廣播這個

filter_matrix = np.array(filterColumns) 
combination_array = np.array(combination) 
bool_matrix = filter_matrix == combination_array[newaxis, :] #not sure of the newaxis position 
subset = raw_data[bool_matrix]

有做同樣的事情。但是更簡單的方法，如果你的過濾器是在基體中，特別是通過numpy argsort和numpy roll過的軸。首先，將軸移動到軸線上，直到您將過濾器排列爲第一列，然後對它們進行排序並垂直切割陣列以獲取矩陣的其餘部分。

一般情況下，如果Python中可以避免使用for循環，最好避免它。

更新：

這裏是一個沒有for循環的完整代碼：

import numpy as np 

# select filtering indexes 
filter_indexes = [1, 3] 
# generate the test data 
raw_data = np.random.randint(0, 4, size=(50,5)) 


# create a column that we would use for indexing 
index_columns = raw_data[:, filter_indexes] 

# sort the index columns by lexigraphic order over all the indexing columns 
argsorts = np.lexsort(index_columns.T) 

# sort both the index and the data column 
sorted_index = index_columns[argsorts, :] 
sorted_data = raw_data[argsorts, :] 

# in each indexing column, find if number in row and row-1 are identical 
# then group to check if all numbers in corresponding positions in row and row-1 are identical 
autocorrelation = np.all(sorted_index[1:, :] == sorted_index[:-1, :], axis=1) 

# find out the breakpoints: these are the positions where row and row-1 are not identical 
breakpoints = np.nonzero(np.logical_not(autocorrelation))[0]+1 

# finally find the desired subsets 
subsets = np.split(sorted_data, breakpoints)

另一種實施方法是，索引矩陣轉換成字符串矩陣，和逐行，得到在現在獨特的索引列上進行分割並如上分割。

對於慣例來說，首先滾動索引矩陣直到它們全部位於矩陣的起始位置，以便上面所做的排序清晰可能會更有趣。

來源

2014-10-03 13:28:47 chiffa

我很想接受你的答案，但不是每個人都可以在他們的腦海中旋轉n維矩陣。 ;）換句話說，我不確定如何爲我的問題實現此解決方案。我深入研究了argsort和rollaxis文檔，但如何將它們應用於獲取子集已經超出了我的想象。幸運的是，我的數據不是太大，所以循環很好，儘管我完全同意你的說法，儘可能避免循環。 – 2014-10-03 20:35:25

請更新。實際上，這是我想到的lexsort，而不是argsot，它們都提供了排序索引數組，僅在一個軸的幾個單個元素上與一個軸的幾個元素相關：D – chiffa 2014-10-03 22:00:24

非常感謝您的詳細更新！我現在遵循你的邏輯，並學習了一種更好的方式來思考numpy中的數據操作。你用來獲得自相關和斷點相當標準的方法嗎？看起來，新手很難理解你在沒有評論的情況下在代碼中做什麼。 – 2014-10-05 16:06:56

像這樣的事情？

variable_mask = np.ones_like(filterColumns[0])  # select all rows initially 
for column, val in zip(filterColumns, combination): 
    variable_mask &= (column == val) 
subset = raw_data[variable_mask]

來源

2014-10-03 13:23:48 r3m0t

的Python和NumPy的 - 創造ndarray

回答

相關問題