2
我有一個列表的列表,像這樣:最快的方式
import numpy as np
import random
import time
import itertools
N = 1000
x =np.random.random((N,N))
y = np.zeros((N,N))
z = np.random.random((N,N))
list_of_lists = [[x, y], [y,z], [z,x]]
併爲每個子表我想計算非零的個數,均值和標準差。
我已經做到了,像這樣:
distribution = []
alb_mean = []
alb_std = []
start = time.time()
for i in range(len(list_of_lists)):
one_mean = []
non_zero_l = []
one_list = list_of_lists[i]
for n in one_list:
#count non_zeros
non_zero_count = np.count_nonzero(n)
non_zero_l.append(non_zero_count)
#assign nans
n = n.astype(float)
n[n == 0.0] = np.nan
#flatten the matrix
n = np.array(n.flatten())
one_mean.append(n)
#append means and stds
distribution.append(sum(non_zero_l))
alb_mean.append(np.nanmean(one_mean))
alb_std.append(np.nanstd(one_mean))
end = time.time()
print "Loop took {} seconds".format((end - start))
這需要0.23秒。
我試圖使這個更快了第二個選項:
distribution = []
alb_mean = []
alb_std = []
start = time.time()
for i in range(len(list_of_lists)):
for_mean = []
#get one list
one_list = list_of_lists[i]
#flatten the list
chain = itertools.chain(*one_list)
flat = list(chain)
#count non_zeros
non_zero_count = np.count_nonzero(flat)
distribution.append(non_zero_count)
#remove zeros
remove_zero = np.setdiff1d(flat ,[0.0])
alb_mean.append(np.nanmean(remove_zero))
alb_std.append(np.nanstd(remove_zero))
end = time.time()
print "Loop took {} seconds".format((end - start))
這實際上是慢,需要0.88秒。
絕對數量的循環讓我覺得有一個更好的方法來做到這一點。我已經嘗試過numba
,但它並沒有像在函數中追加一樣。一個與3
迭代,另一個2
迭代 -
爲什麼要使用列表中,列出了* numpy的功能*?爲什麼不使用'numpy'數組? –
原諒我,因爲我是numpy世界的新手,但是我正在做我所做的事情,因爲列表中的數據表示numpy 2D矩陣 –
將輸入數組'ints'與零。目前,使用'np.random.random((N,N))',它不可能有任何零,所以像'np.count_nonzero(n)'這樣的計算是多餘的。 – Divakar