0
我想通過讀取塊並通過使用多處理庫來處理每個塊來並行處理某個文件。以下是我的代碼:Python多重處理IndexError
from multiprocessing import Pool
from itertools import islice
import traceback
#Produce key value pairs (Date, Market_Share*Market_Share)
def Map(L):
results = []
for w in L:
temp = w.split(',')
Date = temp[0]
Share = float(temp[1][:-1])
ShareSquare = str(Share*Share)
results.append((Date,ShareSquare))
return results
if __name__=='__main__':
pool = Pool(2)
f = open('C:/Users/Daniel/Desktop/Project/Optiver/atchm_9450.csv','r')
fw = open('C:/Users/Daniel/Desktop/Project/Optiver/marketshare.csv','w')
f.readline()
while True:
next_n_lines = list(islice(f,16))
if not next_n_lines:
break
else:
l = pool.map(Map,next_n_lines)
f.close()
fw.close()
然而,生產指數超出範圍的錯誤:
Traceback (most recent call last):
File "trial.py", line 29, in <module>
l = pool.map(Map,next_n_lines)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
IndexError: list index out of range
我傳遞到地圖功能的列表對象是一樣的東西['6月26日/ 2014,68.90 \ n','6/27/2014,68.84 \ n','6/30/2014,68.80 \ n'....]
當沒有涉及的並行性時,它可以正常工作(不調用池)。
什麼可能導致此行爲?
Thx。問題在於Map函數實際上並不需要執行for循環迭代,因爲pool.map已經具有將列表拆分爲塊並且並行遍歷每個塊的機制。所以Map函數的參數應該是一個元素而不是一個列表。我最初認爲,pool.map將列表分成幾個「子列表」,這些子列表應該傳遞給函數。 – user2517984 2014-11-24 14:55:10