2017-09-13 24 views
1

給定一個N行通過M列陣列,我需要,並在同一時間更新的(唯一的)列索引指向一個單獨的列表,將它洗洗牌元素的新位置。洗牌多維數組的列和索引的更新列表相應

例如,採取以下(3, 5)陣列

a = [[ 0.15337424 0.21176979 0.19846229 0.5245618 0.24452392] 
    [ 0.17460481 0.45727362 0.26914808 0.81620202 0.8898504 ] 
    [ 0.50104826 0.22457154 0.24044079 0.09524352 0.95904348]] 

和列的索引列表:

idxs = [0 3 4] 

如果我從新按列排列,所以它看起來是這樣的:

a = [[ 0.24452392 0.19846229 0.5245618 0.21176979 0.15337424] 
    [ 0.8898504 0.26914808 0.81620202 0.45727362 0.17460481] 
    [ 0.95904348 0.24044079 0.09524352 0.22457154 0.50104826]] 

索引數組應該修改爲如下所示:

idxs = [4 2 0] 

我可以通過之前和之後洗牌換位它通過洗牌列陣列(見下面的代碼),但我不知道我怎麼會更新索引列表。整個過程需要儘可能快,因爲新陣列將執行數百萬次。

import numpy as np 

def getData(): 
    # Array of (N, M) dimensions 
    N, M = 10, 500 
    a = np.random.random((N, M)) 

    # List of unique column indexes in a. 
    # This list could be empty, or it could have a length of 'M' 
    # (ie: contain all the indexes in the range of 'a'). 
    P = int(M * np.random.uniform()) 
    idxs = np.arange(0, M) 
    np.random.shuffle(idxs) 
    idxs = idxs[:P] 

    return a, idxs 


a, idxs = getData() 

# Shuffle a by columns 
b = a.T 
np.random.shuffle(b) 
a = b.T 

# Update the 'idxs' list? 

回答

1

獲取的列索引的隨機置換與np.random.permutation -

col_idx = np.random.permutation(a.shape[1]) 

獲取改組輸入數組 -

shuffled_a = a[:,col_idx] 

然後,只需索引到排序指數col_idxidxs爲追溯版 -

shuffled_idxs = col_idx.argsort()[idxs] 

樣品運行 -

In [236]: a # input array 
Out[236]: 
array([[ 0.1534, 0.2118, 0.1985, 0.5246, 0.2445], 
     [ 0.1746, 0.4573, 0.2691, 0.8162, 0.8899], 
     [ 0.501 , 0.2246, 0.2404, 0.0952, 0.959 ]]) 

In [237]: col_idx = np.random.permutation(a.shape[1]) 

# Let's use the sample permuted column indices to verify desired o/p 
In [238]: col_idx = np.array([4,2,3,1,0]) 

In [239]: shuffled_a = a[:,col_idx] 

In [240]: shuffled_a 
Out[240]: 
array([[ 0.2445, 0.1985, 0.5246, 0.2118, 0.1534], 
     [ 0.8899, 0.2691, 0.8162, 0.4573, 0.1746], 
     [ 0.959 , 0.2404, 0.0952, 0.2246, 0.501 ]]) 

In [241]: col_idx.argsort()[idxs] 
Out[241]: array([4, 2, 0]) 
+0

感謝您Divakar idxs將是有益的!我試圖提高函數的性能(正如您可能從我以前的問題中猜到的那樣),並且您在https://stackoverflow.com/a/46079837/1391441中給出的答案仍然產生最快的結果。 – Gabriel

0
original_index = range(a.shape[1]) 
permutation_series = pd.Series(original_index) 
permutation_series.index = np.random.permutation(original_index) 
new_idx = permutation_series[old_idx] 
a = a[:,permutation_series.index] 
+0

請解釋您的代碼與OP的不同之處,以及解決問題的方式/回答他們的問題。我建議這個指南創建一個有用的答案stackoverflow.com/help/how-to-answer –

0

數據陣列必須是洗牌使用索引數組,所以第一洗牌索引陣列,並使用該洗牌的數據數組。

import numpy as np 

def getData(): 
    # Array of (N, M) dimensions 
    a = np.arange(15).reshape(3, 5) 
    # [[ 0 1 2 3 4] 
    # [ 5 6 7 8 9] 
    # [10 11 12 13 14]] 
    idxs = np.arange(a.shape[0]) # [0 1 2] 
    return a, idxs 

a, idxs = getData() 

# Shuffle a by columns 
b = a.T 
# [[ 0 5 10] 
# [ 1 6 11] 
# [ 2 7 12] 
# [ 3 8 13] 
# [ 4 9 14]] 

np.random.shuffle(idxs) # [2 0 1] 
a = b[:, idxs] 

# [[10 0 5] 
# [11 1 6] 
# [12 2 7] 
# [13 3 8] 
# [14 4 9]] 

所以如果你想洗牌任何其他陣列說X匹配陣列的洗牌一