如何迭代切片列表？

-1

我找不到性能增強問題的解決方案。如何迭代切片列表？

我有一維數組，我想計算在滑動指數窗總和，這裏有一個例子代碼：

import numpy as np 
input = np.linspace(1, 100, 100) 
list_of_indices = [[0, 10], [5, 15], [45, 50]] #just an example 
output = np.array([input[idx[0]: idx[1]].sum() for idx in list_of_indices])

的output陣列的計算相比，numpy的向量化內置極爲緩慢在功能上。在現實生活中我list_of_indices包含數萬[lower bound, upper bound]雙，該環形絕對是一款高性能的python腳本的瓶頸。

如何解決這個問題，使用numpy的內部功能：像面具，聰明np.einsum，或者跟其他的東西嗎？由於我在HPC領域工作，我也擔心內存消耗。

沒有人有在尊重的性能要求這一問題的答案嗎？

來源

2014-12-06 Louis Gallin

您好，歡迎來到SO！請避免在問題中添加與問題無關的信息。另外請考慮閱讀解釋如何提出好問題的[幫助]。特別是，這個問題似乎更適合於CodeReview站點，而不是SO。 – BartoszKP 2014-12-06 12:59:06

這是很容易誤用numpy和膨脹你的腳本的內存足跡與巨大的數組。至少現在寫的方式，我發現與SO有關的問題。 – 2014-12-06 18:17:03

即使在技術上關於代碼加速，大多數numpy'vectoriztion'問題都是在SO中回答的。 – hpaulj 2014-12-07 08:29:38

這裏有一個簡單的方法來嘗試：保持同樣的解決方案結構，你已經有了，這大概工作。只需使存儲創建和索引更高效。如果要爲大多數指標總結從input許多元素，總和應該採取更多的時間比for循環。例如：

# Put all the indices in a nice efficient structure: 
idxx = np.hstack((np.array(list_of_indices, dtype=np.uint16), 
    np.arange(len(list_of_indices), dtype=np.uint16)[:,None])) 
# Allocate appropriate data type to the precision and range you need, 
# Do it in one go to be time-efficient 
output = np.zeros(len(list_of_indices), dtype=np.float32) 
for idx0, idx1, idxo in idxx: 
    output[idxo] = input[idx0:idx1].sum()

如果len(list_if_indices) > 2**16，使用uint32而非uint16。

來源

2014-12-06 18:05:43

如果：

input是大致相同的長度output或更短
的output值也有類似幅度

...你可以創建你的輸入值的cumsum。然後總和變成減法。

cs = np.cumsum(input, dtype=float32) # or float64 if you need it 
loi = np.array(list_of_indices, dtype=np.uint16) 
output = cs[loi[:,1]] - cs[loi[:,0]]

這裏的數值是危險的精度損失，如果input擁有大型和微小值運行。那麼cumsum可能對您而言不夠準確。

來源

2014-12-06 18:26:12

如何迭代切片列表？

回答

相關問題