地圖多處理

import multiprocessing 

data = range(10) 

def map_func(i): 
    return [i] 

def reduce_func(a,b): 
    return a+b 

p = multiprocessing.Pool(processes=4) 
p.map(map_func, data)

減少如何使用reduce_func()作爲paralelised map_func() reduce函數。地圖多處理

這裏是什麼，我想做一個pySpark例如：

rdd = sc.parallelize(data) 
result = rdd.map(map_func) 
final_result = result.reduce(reduce_func)

來源

2016-07-13 Ghilas BELHADJ

'functools.reduce（reduce_func，p.map（map_func，數據））'產生數字0到9的列表，隨機性取決於量級多處理正在映射數據。 – chapelo

不錯，謝謝。 –

根據該文件，multiprocessing.Pool.map()塊，直到結果已經準備就緒。隨機性是不可能的。爲了實現隨機處理順序中，使用imap_unordered()方法：

from functools import reduce 

result = p.imap_unordered(map_func, data) 
final_result = reduce(reduce_func, result) 

# Three different runs: 
# [0, 1, 4, 5, 2, 6, 8, 9, 7, 3] 
# [0, 1, 4, 5, 2, 3, 8, 7, 6, 9] 
# [0, 1, 2, 5, 6, 7, 8, 4, 3, 9]

來源

2016-07-13 22:31:23 chapelo

回答

相關問題