我必須從sklearn KDTree中查詢大量的向量,它是搜索器類的路徑。我試圖使用python multiprocessing並行查詢它們,但並行代碼與單一版本幾乎相同(或更多)的時間。Python多處理:檢查內存是共享還是被複制
import time, numpy as np
from sklearn.neighbors import KDTree
from multiprocessing import Pool
def glob_query(arg, **kwarg):
return Searcher.query(*arg, **kwarg)
class Searcher:
def __init__(self, N, D):
self.kdt = KDTree(np.random.rand(N,D), leaf_size=30, metric="euclidean")
def query(self, X):
return self.kdt.query(X, k=5, return_distance=False)
def query_sin(self, X):
return [self.query(x) for x in X]
def query_par(self, X):
p = Pool(4)
return p.map(glob_query, zip([self]*len(X), X))
if __name__=="__main__":
N = 1000000 # Number of points to be indexed
D = 50 # Dimensions
searcher = Searcher(N, D)
E = 100 # Number of points to be searched
points = np.random.rand(E, D)
# Works fine
start = time.time()
searcher.query_sin(points)
print("Time taken - %f"%(time.time()-start))
# Slower than single core
start = time.time()
print searcher.query_par(points)
print("Time taken - %f"%(time.time()-start))
Time taken - 28.591089
Time taken - 36.920716
我想知道
- 如果我的kd樹被在每個工作線程
- 複製是那裏parallelise搜索的另一種方法(使用悽楚?)
如果我在'init'創建池,我得到一個錯誤說'池對象不能處理或pickled' – kampta
@kampta之間進行傳遞:如果你確實是最終需要傳遞'pool',你可以做所以使用'pathos' ......實質上,你可以進行嵌套的'map'調用(或'map'變體)。 –