以下是最基本的方法,我知道算在馬爾科夫鏈的過渡,並用它來填充轉換矩陣:如何加快Numpy中的轉換矩陣創建?
def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix):
for i in xrange(1, len(markov_chain)):
old_state = markov_chain[i - 1]
new_state = markov_chain[i]
transition_counts_matrix[old_state, new_state] += 1
我試着加快它在3種不同的方式:
1)使用基於該Matlab代碼的稀疏矩陣的一行:
transition_matrix = full(sparse(markov_chain(1:end-1), markov_chain(2:end), 1))
其中在numpy的/ SciPy的,看起來像這樣:
def get_sparse_counts_matrix(markov_chain, number_of_states):
return coo_matrix(([1]*(len(markov_chain) - 1), (markov_chain[0:-1], markov_chain[1:])), shape=(number_of_states, number_of_states))
而且我試過一對夫婦更Python的調整,像使用ZIP():
for old_state, new_state in zip(markov_chain[0:-1], markov_chain[1:]):
transition_counts_matrix[old_state, new_state] += 1
和隊列:
old_and_new_states_holder = Queue(maxsize=2)
old_and_new_states_holder.put(markov_chain[0])
for new_state in markov_chain[1:]:
old_and_new_states_holder.put(new_state)
old_state = old_and_new_states_holder.get()
transition_counts_matrix[old_state, new_state] += 1
但這些都不3種方法加快東西。實際上,除zip()解決方案外,其他所有解決方案的速度至少比原始解決方案慢10倍。
有沒有其他解決方案值得研究?
用於從大量的鏈
最佳答案對上述問題的構建轉移矩陣改性溶液特別是DSM的。然而,誰想要填充基於數百萬馬爾可夫鏈的列表上的轉換矩陣,最快的方法是這樣的:
def fast_increment_transition_counts_from_chain(markov_chain, transition_counts_matrix):
flat_coords = numpy.ravel_multi_index((markov_chain[:-1], markov_chain[1:]), transition_counts_matrix.shape)
transition_counts_matrix.flat += numpy.bincount(flat_coords, minlength=transition_counts_matrix.size)
def get_fake_transitions(markov_chains):
fake_transitions = []
for i in xrange(1,len(markov_chains)):
old_chain = markov_chains[i - 1]
new_chain = markov_chains[i]
end_of_old = old_chain[-1]
beginning_of_new = new_chain[0]
fake_transitions.append((end_of_old, beginning_of_new))
return fake_transitions
def decrement_fake_transitions(fake_transitions, counts_matrix):
for old_state, new_state in fake_transitions:
counts_matrix[old_state, new_state] -= 1
def fast_get_transition_counts_matrix(markov_chains, number_of_states):
"""50% faster than original, but must store 2 additional slice copies of all markov chains in memory at once.
You might need to break up the chains into manageable chunks that don't exceed your memory.
"""
transition_counts_matrix = numpy.zeros([number_of_states, number_of_states])
fake_transitions = get_fake_transitions(markov_chains)
markov_chains = list(itertools.chain(*markov_chains))
fast_increment_transition_counts_from_chain(markov_chains, transition_counts_matrix)
decrement_fake_transitions(fake_transitions, transition_counts_matrix)
return transition_counts_matrix
我打算接受這個答案,但我想跟進一個額外的問題。當我重複使用bincount來填充基於數千個markov鏈的轉換計數矩陣時,我的原始代碼仍然更快。我認爲這是因爲counts_matrix.flat + = numpy.bincount(flat_coords,minlength = counts_matrix.size)在更新counts_matrix比我原來的代碼更慢。關於這個的想法? –
更新內容:我發現用於填充基於噸馬爾可夫鏈的轉換矩陣的最快解決方案是將這些鏈依次合併到一起,使用二進制數,然後獲取假轉換(從一個鏈的末尾到開始),然後減少每個假轉換的計數。該解決方案比我的原始版本快大約25%。 –
@ some-guy:隨時爲您的用例找到最佳解決方案,並將其作爲答案並接受。 – DSM