2017-04-27 131 views
1

我想要做的是檢查哪個多處理最適合我的數據。我試着多進程這個循環:Python多進程/線程循環。

def __pure_calc(args): 

    j = args[0] 
    point_array = args[1] 
    empty = args[2] 
    tree = args[3] 

    for i in j: 
      p = tree.query(i) 

      euc_dist = math.sqrt(np.sum((point_array[p[1]]-i)**2)) 

      ##add one row at a time to empty list 
      empty.append([i[0], i[1], i[2], euc_dist, point_array[p[1]][0], point_array[p[1]][1], point_array[p[1]][2]]) 

    return empty 

只是純粹的功能正在6.52秒。

我的第一種方法是multiprocessing.map:

from multiprocessing import Pool 

def __multiprocess(las_point_array, point_array, empty, tree): 

    pool = Pool(os.cpu_count()) 

    for j in las_point_array: 
     args=[j, point_array, empty, tree] 
     results = pool.map(__pure_calc, args) 

    #close the pool and wait for the work to finish 
    pool.close() 
    pool.join() 

    return results 

當我檢查了其他的答案如何多進程功能應該很容易爲:地圖(通話功能,輸入) - 完成。但由於某種原因,我的多處理器不在我的輸入之外,因爲scipy.spatial.ckdtree.cKDTree對象不是可代碼化的上升錯誤。

所以我試圖用apply_async:

from multiprocessing.pool import ThreadPool 

def __multiprocess(arSegment, wires_point_array, ptList, tree): 

    pool = ThreadPool(os.cpu_count()) 

    args=[arSegment, point_array, empty, tree] 

    result = pool.apply_async(__pure_calc, [args]) 

    results = result.get() 

它與時弊運行。對於我的測試數據,我設法在6.42秒內計算它。

爲什麼apply_async接受ckdtree沒有任何問題而pool.map不是?我需要改變才能使其運行?

回答

1

pool.map(function, iterable),它基本上與itertool的map具有相同的佔位面積。來自迭代器的每個項目將爲您的__pure_calc函數的args

在這種情況下,我猜你可能會改變這個:

def __multiprocess(las_point_array, point_array, empty, tree): 

    pool = Pool(os.cpu_count()) 

    args_list = [ 
     [j, point_array, empty, tree] 
     for j in las_point_array 
    ] 

    results = pool.map(__pure_calc, args_list) 

    #close the pool and wait for the work to finish 
    pool.close() 
    pool.join() 

    return results