如何加快Numpy中的轉換矩陣創建？

以下是最基本的方法，我知道算在馬爾科夫鏈的過渡，並用它來填充轉換矩陣：如何加快Numpy中的轉換矩陣創建？

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix): 
    for i in xrange(1, len(markov_chain)): 
     old_state = markov_chain[i - 1] 
     new_state = markov_chain[i] 
     transition_counts_matrix[old_state, new_state] += 1

我試着加快它在3種不同的方式：

1）使用基於該Matlab代碼的稀疏矩陣的一行：

transition_matrix = full(sparse(markov_chain(1:end-1), markov_chain(2:end), 1))

其中在numpy的/ SciPy的，看起來像這樣：

def get_sparse_counts_matrix(markov_chain, number_of_states): 
    return coo_matrix(([1]*(len(markov_chain) - 1), (markov_chain[0:-1], markov_chain[1:])), shape=(number_of_states, number_of_states))

而且我試過一對夫婦更Python的調整，像使用ZIP（）：

for old_state, new_state in zip(markov_chain[0:-1], markov_chain[1:]): 
    transition_counts_matrix[old_state, new_state] += 1

和隊列：

old_and_new_states_holder = Queue(maxsize=2) 
old_and_new_states_holder.put(markov_chain[0]) 
for new_state in markov_chain[1:]: 
    old_and_new_states_holder.put(new_state) 
    old_state = old_and_new_states_holder.get() 
    transition_counts_matrix[old_state, new_state] += 1

但這些都不3種方法加快東西。實際上，除zip（）解決方案外，其他所有解決方案的速度至少比原始解決方案慢10倍。

有沒有其他解決方案值得研究？

用於從大量的鏈
最佳答案對上述問題的構建轉移矩陣改性溶液特別是DSM的。然而，誰想要填充基於數百萬馬爾可夫鏈的列表上的轉換矩陣，最快的方法是這樣的：

def fast_increment_transition_counts_from_chain(markov_chain, transition_counts_matrix): 
    flat_coords = numpy.ravel_multi_index((markov_chain[:-1], markov_chain[1:]), transition_counts_matrix.shape) 
    transition_counts_matrix.flat += numpy.bincount(flat_coords, minlength=transition_counts_matrix.size) 

def get_fake_transitions(markov_chains): 
    fake_transitions = [] 
    for i in xrange(1,len(markov_chains)): 
     old_chain = markov_chains[i - 1] 
     new_chain = markov_chains[i] 
     end_of_old = old_chain[-1] 
     beginning_of_new = new_chain[0] 
     fake_transitions.append((end_of_old, beginning_of_new)) 
    return fake_transitions 

def decrement_fake_transitions(fake_transitions, counts_matrix): 
    for old_state, new_state in fake_transitions: 
     counts_matrix[old_state, new_state] -= 1 

def fast_get_transition_counts_matrix(markov_chains, number_of_states): 
    """50% faster than original, but must store 2 additional slice copies of all markov chains in memory at once. 
    You might need to break up the chains into manageable chunks that don't exceed your memory. 
    """ 
    transition_counts_matrix = numpy.zeros([number_of_states, number_of_states]) 
    fake_transitions = get_fake_transitions(markov_chains) 
    markov_chains = list(itertools.chain(*markov_chains)) 
    fast_increment_transition_counts_from_chain(markov_chains, transition_counts_matrix) 
    decrement_fake_transitions(fake_transitions, transition_counts_matrix) 
    return transition_counts_matrix

來源

2012-11-04 some-guy

如何這樣的事情，採取的np.bincount優勢？不是超強健的，但功能。 [感謝@Warren Weckesser的設置。]

import numpy as np 
from collections import Counter 

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix): 
    for i in xrange(1, len(markov_chain)): 
     old_state = markov_chain[i - 1] 
     new_state = markov_chain[i] 
     transition_counts_matrix[old_state, new_state] += 1 

def using_counter(chain, counts_matrix): 
    counts = Counter(zip(chain[:-1], chain[1:])) 
    from_, to = zip(*counts.keys()) 
    counts_matrix[from_, to] = counts.values() 

def using_bincount(chain, counts_matrix): 
    flat_coords = np.ravel_multi_index((chain[:-1], chain[1:]), counts_matrix.shape) 
    counts_matrix.flat = np.bincount(flat_coords, minlength=counts_matrix.size) 

def using_bincount_reshape(chain, counts_matrix): 
    flat_coords = np.ravel_multi_index((chain[:-1], chain[1:]), counts_matrix.shape) 
    return np.bincount(flat_coords, minlength=counts_matrix.size).reshape(counts_matrix.shape)

這給：

In [373]: t = np.random.randint(0,50, 500) 
In [374]: m1 = np.zeros((50,50)) 
In [375]: m2 = m1.copy() 
In [376]: m3 = m1.copy() 

In [377]: timeit increment_counts_in_matrix_from_chain(t, m1) 
100 loops, best of 3: 2.79 ms per loop 

In [378]: timeit using_counter(t, m2) 
1000 loops, best of 3: 924 us per loop 

In [379]: timeit using_bincount(t, m3) 
10000 loops, best of 3: 57.1 us per loop

[編輯]

避免flat（在原地不工作的成本）可以節省一些小矩陣的時間：

In [80]: timeit using_bincount_reshape(t, m3) 
10000 loops, best of 3: 22.3 us per loop

來源

2012-11-04 15:55:24 DSM

我打算接受這個答案，但我想跟進一個額外的問題。當我重複使用bincount來填充基於數千個markov鏈的轉換計數矩陣時，我的原始代碼仍然更快。我認爲這是因爲counts_matrix.flat + = numpy.bincount（flat_coords，minlength = counts_matrix.size）在更新counts_matrix比我原來的代碼更慢。關於這個的想法？ –

更新內容：我發現用於填充基於噸馬爾可夫鏈的轉換矩陣的最快解決方案是將這些鏈依次合併到一起，使用二進制數，然後獲取假轉換（從一個鏈的末尾到開始），然後減少每個假轉換的計數。該解決方案比我的原始版本快大約25％。 –

@ some-guy：隨時爲您的用例找到最佳解決方案，並將其作爲答案並接受。 – DSM

這裏有一個更快的方法。這個想法是計算每個轉換的出現次數，並在矩陣的向量化更新中使用計數。（我假設在markov_chain中可以發生多次相同的轉換。）collections庫中的Counter類用於計算每個轉換的出現次數。

from collections import Counter 

def update_matrix(chain, counts_matrix): 
    counts = Counter(zip(chain[:-1], chain[1:])) 
    from_, to = zip(*counts.keys()) 
    counts_matrix[from_, to] += counts.values()

時序例如，在IPython中：大小

In [64]: t = np.random.randint(0,50, 500) 

In [65]: m1 = zeros((50,50)) 

In [66]: m2 = zeros((50,50)) 

In [67]: %timeit increment_counts_in_matrix_from_chain(t, m1) 
1000 loops, best of 3: 895 us per loop 

In [68]: %timeit update_matrix(t, m2) 
1000 loops, best of 3: 504 us per loop

它的速度更快，但不是單要快。對於真正的加速，你可能會考慮在Cython中實現它。

來源

2012-11-04 14:51:06

好了，一些想法亂動，有一些輕微的改善（在人類undestanding的成本）

讓我們先從整數的長度爲3000的0到9之間的隨機向量：

L = 3000 
N = 10 
states = array(randint(N),size=L) 
transitions = np.zeros((N,N))

你的方法在我的機器上有時間性能11.4毫秒。

一點點改進的第一件事是，以避免兩次讀取數據，其存儲在一個臨時變量：

old = states[0] 
for i in range(1,len(states)): 
    new = states[i] 
    transitions[new,old]+=1 
    old=new

這給你一個〜10％的改善和下降時間10.9毫秒。

更繁難的方法使用大步：

def rolling(a, window): 
    shape = (a.size - window + 1, window) 
    strides = (a.itemsize, a.itemsize) 
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides) 

state_2 = rolling(states, 2) 
for i in range(len(state_2)): 
    l,m = state_2[i,0],state_2[i,1] 
    transitions[m,l]+=1

的進步讓你閱讀陣列的連續號碼欺騙陣列認爲行以不同的方式啓動（好吧，這不是很好描述的，但如果你需要一些時間來閱讀進步，你會得到它）這種做法失去性能，要12.2毫秒，但它是更欺騙這個系統的走廊。兩個扁平化過渡矩陣和跨入陣列一個維數組，你可以多一點加速性能：

transitions = np.zeros(N*N) 
state_2 = rolling(states, 2) 
state_flat = np.sum(state_2 * array([1,10]),axis=1) 
for i in state_flat: 
    transitions[i]+=1 
transitions.reshape((N,N))

這下降到7.75毫秒。這不是一個數量級，但它是無論如何更好的30％:)

來源

2012-11-04 14:59:11 EnricoGiampieri

只是踢，因爲我一直想嘗試它，我申請Numba到您的問題。在代碼中，只涉及添加裝飾（雖然我做了一個直接調用，所以我可以測試JIT變體numba提供此處）：

import numpy as np 
import numba 

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix): 
    for i in xrange(1, len(markov_chain)): 
     old_state = markov_chain[i - 1] 
     new_state = markov_chain[i] 
     transition_counts_matrix[old_state, new_state] += 1 

autojit_func = numba.autojit()(increment_counts_in_matrix_from_chain) 
jit_func = numba.jit(argtypes=[numba.int64[:,::1],numba.double[:,::1]])(increment_counts_in_matrix_from_chain) 

t = np.random.randint(0,50, 500) 
m1 = np.zeros((50,50)) 
m2 = np.zeros((50,50)) 
m3 = np.zeros((50,50))

然後計時：

In [10]: %timeit increment_counts_in_matrix_from_chain(t,m1) 
100 loops, best of 3: 2.38 ms per loop 

In [11]: %timeit autojit_func(t,m2)       

10000 loops, best of 3: 67.5 us per loop 

In [12]: %timeit jit_func(t,m3) 
100000 loops, best of 3: 4.93 us per loop

的autojit方法確實基於運行時的輸入的某些猜測，並且jit功能類型決定。你必須要小心一點，因爲numba在這些早期階段不溝通，有一個誤差jit，如果你在錯誤類型的輸入通過。它只會吐出不正確的答案。

，雖然說，得到一個35倍和485x加速無需任何代碼更改，只是增加了numba通話（也可以稱爲一個裝飾）是在我的書相當令人印象深刻。你可能會使用cython獲得類似的結果，但它需要更多的樣板並編寫一個setup.py文件。

因爲代碼保持可讀和可寫你原以爲有關實現算法的方式我也很喜歡這個解決方案。

來源

2012-11-04 18:41:51 JoshAdel

整潔！啓動成本是多少？ – DSM

@DSM不知道這是否是最好的計時方式，但是'％timeit autojit_func = numba.autojit（）（increment_counts_in_matrix_from_chain）; autojit_func（t，m2）'給出81 us。當我爲簡單的'jit'做類似的事情時，我得到了一堆垃圾收集警告，我認爲這些垃圾收集警告是錯誤的。 – JoshAdel

如何加快Numpy中的轉換矩陣創建？

回答

相關問題