如何結合parBuffer和parListChunk的優點？

我有一些Haskell代碼，涉及到一個大的（65.5k元素）項目列表做很多互不重疊的事情。這似乎很適合並行化，我採用Control.Parallel.Strategies.parBuffer進行並行化。這有所幫助，但我確信這項工作太細化了，而且我還想按大塊處理列表（就像Control.Parallel.Strategies.parListChunk所做的那樣）。然而，因爲我的列表很大，所以只使用的實驗parListChunk沒有獲得儘可能多的加速，因爲必須評估整個65萬個項目列表才能完成這項工作（如程序的內存使用所示）。如何結合parBuffer和parListChunk的優點？

是否有寫Strategy，讓我的好處的方式既parBuffer（即表被視爲一個懶惰的緩衝與評價的可控量），也parListChunk（即工作被分解成件包括列表中的幾個元素而不是個人）。我不確定如何做到這一點。

編輯：根據要求，這裏就是我有工作，完成了有解釋性評論：

parBufferMap :: Int -> Strategy b -> (a -> b) -> [a] -> [b] 
parBufferMap i strat f = withStrategy (parBuffer i strat) . fmap f 

main :: IO() 
main = do 
    let allTables = genAllTables 4 -- a list of 65.5k Tables    
    let results = parBufferMap 512 rdeepseq theNeedful allTables -- theNeedful is what I need to do to each Table, independently of each other 
    let indexed = zip [1..] results 
    let stringified = stringify <$> indexed -- make them pretty for output 
    void . traverse putStrLn $ stringified -- actually print them

我的目標是取代results計算，因爲它是（使用僅parBufferMap）的東西結合了parBufferMap和parListChunk的好處。

來源

2016-07-04 Koz Ross

你是什麼意思是「做很多互不重疊的事情」？一些代碼和/或僞代碼會有所幫助。 – ErikR

@ErikR添加了一些代碼（希望），以顯示我後。 –

因此，看來你婉計算：

map theNeedful allTables

，但你想要做的512代批次表的映射。

這看起來像它會適合你嗎？

-- assuming: 
theNeedful :: Table -> Result 

nthreads = 4 -- number of threads to keep busy 
allTables = ... 
allBatches = chunksOf 512 allTables -- from Data.List.Split 

doBatch :: [Table] -> [Result] 
doBatch tables = map theNeedful tables 

results :: [Result] 
results = concat $ withStrategy (parBuffer nthreads rdeepseq) (map doBatch allBatches) 
...

在詞：

打散表到512代表的每個
地圖doBatch塊在所有批次的
計算的該列表上執行parBuffer
concat結果清單

來源

2016-07-04 04:39:47 ErikR

這基本上就是我最終做的 - 我想知道這種事情是否有必要，而不是僅僅結合''Strategies''，我想。 –

如何結合parBuffer和parListChunk的優點？

回答

相關問題