我一直在玩弄多處理問題,並注意到我的算法在並行化時比在單線程時慢。Python多處理速度比單線程慢
在我的代碼中,我不共享內存。 我很確定我的算法(見代碼),它只是嵌套循環是CPU綁定的。
但是,無論我做什麼。並行代碼在我所有的計算機上運行速度慢10-20%。
我也在20個CPU虛擬機上運行這個程序,每次單線程都擊敗多線程(實際上甚至比我的電腦慢)。
from multiprocessing.dummy import Pool as ThreadPool
from multi import chunks
from random import random
import logging
import time
from multi import chunks
## Product two set of stuff we can iterate over
S = []
for x in range(100000):
S.append({'value': x*random()})
H =[]
for x in range(255):
H.append({'value': x*random()})
# the function for each thread
# just nested iteration
def doStuff(HH):
R =[]
for k in HH['S']:
for h in HH['H']:
R.append(k['value'] * h['value'])
return R
# we will split the work
# between the worker thread and give it
# 5 item each to iterate over the big list
HChunks = chunks(H, 5)
XChunks = []
# turn them into dictionary, so i can pass in both
# S and H list
# Note: I do this because I'm not sure if I use the global
# S, will it spend too much time on cache synchronizatio or not
# the idea is that I dont want each thread to share anything.
for x in HChunks:
XChunks.append({'H': x, 'S': S})
print("Process")
t0 = time.time()
pool = ThreadPool(4)
R = pool.map(doStuff, XChunks)
pool.close()
pool.join()
t1 = time.time()
# measured time for 4 threads is slower
# than when i have this code just do
# doStuff(..) in non-parallel way
# Why!?
total = t1-t0
print("Took", total, "secs")
有很多相關的問題已經打開,但很多都是針對錯誤構造的代碼 - 每個工人都是IO綁定等等。
可能重複的[Python中的multiprocessing.dummy](http://stackoverflow.com/questions/26432411/multiprocessing-dummy-in-python) – MisterMiyagi