我正在嘗試使用pandas數據框的多處理功能,即將數據幀拆分爲8個部分。使用apply(每個部分在不同的進程中處理)應用一些函數。pandas multiprocessing apply
編輯: 這裏的解決方案,我終於發現:
import multiprocessing as mp
import pandas.util.testing as pdt
def process_apply(x):
# do some stuff to data here
def process(df):
res = df.apply(process_apply, axis=1)
return res
if __name__ == '__main__':
p = mp.Pool(processes=8)
split_dfs = np.array_split(big_df,8)
pool_results = p.map(aoi_proc, split_dfs)
p.close()
p.join()
# merging parts processed by different processes
parts = pd.concat(pool_results, axis=0)
# merging newly calculated parts to big_df
big_df = pd.concat([big_df, parts], axis=1)
# checking if the dfs were merged correctly
pdt.assert_series_equal(parts['id'], big_df['id'])
'res = df.apply(process apply,axis = 1)'中有一個空格,是嗎? – 2014-11-06 16:26:31
@yemu你到底想通過這段代碼實現什麼? – Dalek 2014-11-06 16:37:14
目前僅適用於飽和CPU的一個內核。我想使用多進程並使用所有內核來減少處理時間 – yemu 2014-11-06 19:29:09