2014-04-16 105 views
7

我對Haskell來說很新,而且我想創建一個直方圖。我正在使用Data.Vector.Unboxed來融合數據上的操作;這是快速的(用-O -fllvm編譯時),瓶頸是我的摺疊應用程序;它彙總了桶的數量。在Haskell中進行直方圖計算的速度更快

我該如何加快速度?我讀過關於通過嚴格保密來減少Thunk的數量,所以我通過使用seq和foldr來制定嚴格的規定,但是沒有看到性能提高。強烈鼓勵您的想法。

import qualified Data.Vector.Unboxed as V 

histogram :: [(Int,Int)] 
histogram = V.foldr' agg [] $ V.zip k v 
where 
    n = 10000000 
    c = 1000000 
    k = V.generate n (\i -> i `div` c * c) 
    v = V.generate n (\i -> 1) 
    agg kv [] = [kv] 
    agg [email protected](k,v) [email protected]((ck,cv):as) 
     | k == ck = let a = (ck,cv+v):as in a `seq` a 
     | otherwise = let a = kv:acc in a `seq` a 

main :: IO() 
main = print histogram 

與編譯:

ghc --make -O -fllvm histogram.hs 
+0

嘗試'-O2'而不是簡單的'-O'。我不確定當你使用'-O'時默認的是什麼。 – Sibi

+3

@Sibi'-O'與'-O1'相同,所以'-O2'確實值得一試 – bennofs

+0

''比'div'快。 – Franky

回答

15

首先,-O2 -rtsopts編譯程序。然後,讓在那裏你可以優化,與選項+RTS -sstderr運行程序的第一個想法:

$ ./question +RTS -sstderr 
[(0,1000000),(1000000,1000000),(2000000,1000000),(3000000,1000000),(4000000,1000000),(5000000,1000000),(6000000,1000000),(7000000,1000000),(8000000,1000000),(9000000,1000000)] 
    1,193,907,224 bytes allocated in the heap 
    1,078,027,784 bytes copied during GC 
    282,023,968 bytes maximum residency (7 sample(s)) 
     86,755,184 bytes maximum slop 
      763 MB total memory in use (0 MB lost due to fragmentation) 

            Tot time (elapsed) Avg pause Max pause 
    Gen 0  1964 colls,  0 par 3.99s 4.05s  0.0021s 0.0116s 
    Gen 1   7 colls,  0 par 1.60s 1.68s  0.2399s 0.6665s 

    INIT time 0.00s ( 0.00s elapsed) 
    MUT  time 2.67s ( 2.68s elapsed) 
    GC  time 5.59s ( 5.73s elapsed) 
    EXIT time 0.02s ( 0.03s elapsed) 
    Total time 8.29s ( 8.43s elapsed) 

    %GC  time  67.4% (67.9% elapsed) 

    Alloc rate 446,869,876 bytes per MUT second 

    Productivity 32.6% of total user, 32.0% of total elapsed 

注意67%的時間在 GC是花!顯然有一些錯誤。要了解什麼是錯的,我們可以運行與堆分析程序中啓用(使用+RTS -h),產生如下圖:

First heap profile

所以,你泄露的thunk。這是如何發生的?查看代碼,在agg中唯一一次建立(遞歸)thunk的是當你添加時。通過從而增加一聲巨響紙樣製作cv嚴格修復該問題:

{-# LANGUAGE BangPatterns #-} 
import qualified Data.Vector.Unboxed as V 

histogram :: [(Int,Int)] 
histogram = V.foldr' agg [] $ V.zip k v 
where 
    n = 10000000 
    c = 1000000 
    k = V.generate n (\i -> i `div` c * c) 
    v = V.generate n id 
    agg kv [] = [kv] 
    agg [email protected](k,v) [email protected]((ck,!cv):as) -- Note the ! 
     | k == ck = (ck,cv+v):as 
     | otherwise = kv:acc 

main :: IO() 
main = print histogram 

輸出:

$ time ./improved +RTS -sstderr 
[(0,499999500000),(1000000,1499999500000),(2000000,2499999500000),(3000000,3499999500000),(4000000,4499999500000),(5000000,5499999500000),(6000000,6499999500000),(7000000,7499999500000),(8000000,8499999500000),(9000000,9499999500000)] 
    672,063,056 bytes allocated in the heap 
      94,664 bytes copied during GC 
    160,028,816 bytes maximum residency (2 sample(s)) 
     1,464,176 bytes maximum slop 
      155 MB total memory in use (0 MB lost due to fragmentation) 

            Tot time (elapsed) Avg pause Max pause 
    Gen 0  992 colls,  0 par 0.03s 0.03s  0.0000s 0.0001s 
    Gen 1   2 colls,  0 par 0.03s 0.03s  0.0161s 0.0319s 

    INIT time 0.00s ( 0.00s elapsed) 
    MUT  time 1.24s ( 1.25s elapsed) 
    GC  time 0.06s ( 0.06s elapsed) 
    EXIT time 0.03s ( 0.03s elapsed) 
    Total time 1.34s ( 1.34s elapsed) 

    %GC  time  4.4% (4.5% elapsed) 

    Alloc rate 540,674,868 bytes per MUT second 

    Productivity 95.5% of total user, 95.1% of total elapsed 

./improved +RTS -sstderr 1,14s user 0,20s system 99% cpu 1,352 total 

這是好多了。


所以,現在你可以問,爲什麼問題出現,即使你使用seq?原因是seq只強制第一個參數爲WHNF,而對於一對(_,_)(其中_是未評估的thunk)已經是WHNF!此外,seq a a相同a,因爲它seq a b(非正式)表示:評價B之前進行評估,所以seq a a只是表示:評估前的評估,這是一樣的只是評估!

+1

謝謝你的回覆。你向我展示了爲什麼它很慢,如何改進它,以及如何描述(從來不知道這些CL選項)。我會給你更多的觀點,如果我可以:) – jap