2013-08-22 61 views
4

我在Haskell和Scala中有一段非常簡單的代碼。此代碼旨在以非常嚴格的循環運行,因此性能很重要。問題是Haskell比Scala慢10倍左右。這裏是Haskell代碼。Haskell矢量性能與Scala相比

{-# LANGUAGE BangPatterns #-} 
import qualified Data.Vector.Unboxed as VU 

newtype AffineTransform = AffineTransform {get :: (VU.Vector Double)} deriving (Show) 

{-# INLINE runAffineTransform #-} 
runAffineTransform :: AffineTransform -> (Double, Double) -> (Double, Double) 
runAffineTransform affTr (!x, !y) = (get affTr `VU.unsafeIndex` 0 * x + get affTr `VU.unsafeIndex` 1 * y + get affTr `VU.unsafeIndex` 2, 
             get affTr `VU.unsafeIndex` 3 * x + get affTr `VU.unsafeIndex` 4 * y + get affTr `VU.unsafeIndex` 5) 

testAffineTransformSpeed :: AffineTransform -> Int -> (Double, Double) 
testAffineTransformSpeed affTr count = go count (0.5, 0.5) 
    where go :: Int -> (Double, Double) -> (Double, Double) 
     go 0 res = res 
     go !n !res = go (n-1) (runAffineTransform affTr res) 

還可以做些什麼來改進此代碼?

+0

你是如何編譯它以及使用哪個編譯器的? –

+0

它是用ghc 7.6.3編譯的。選項是「-O2-Wall -funbox-strict-fields -threaded -rtsopts」。我期望-funbox-strict-fields就足夠了,但事實並非如此。那麼,我是一個完整的新手,所以我的期望可能會有所改變。 – Aurimas

+1

爲什麼你需要'Vector'來表示仿射變換?爲它製作ADT並處理座標數組似乎更合理。 – leventov

回答

8

的主要問題是

runAffineTransform affTr (!x, !y) = (get affTr `VU.unsafeIndex` 0 * x 
            + get affTr `VU.unsafeIndex` 1 * y 
            + get affTr `VU.unsafeIndex` 2, 
             get affTr `VU.unsafeIndex` 3 * x 
            + get affTr `VU.unsafeIndex` 4 * y 
            + get affTr `VU.unsafeIndex` 5) 

產生對的thunkrunAffineTransform被調用時,組件不會被評估,它們仍然是thunk,直到某些消費者要求對它們進行評估。

testAffineTransformSpeed affTr count = go count (0.5, 0.5) 
    where go :: Int -> (Double, Double) -> (Double, Double) 
     go 0 res = res 
     go !n !res = go (n-1) (runAffineTransform affTr res) 

不是消費者,在res轟僅評估爲最外層的構造,(,),和你這是在最終只評估了

runAffineTransform affTr (runAffineTrasform affTr (runAffineTransform affTr (...))) 

結果是,當最後的正常形式是需要的。

如果強行結果的部件要立即進行評估,

runAffineTransform affTr (!x, !y) = case 
    ( get affTr `U.unsafeIndex` 0 * x 
    + get affTr `U.unsafeIndex` 1 * y 
    + get affTr `U.unsafeIndex` 2 
    , get affTr `U.unsafeIndex` 3 * x 
    + get affTr `U.unsafeIndex` 4 * y 
    + get affTr `U.unsafeIndex` 5 
) of (!a,!b) -> (a,b) 

,讓它被內聯,使用自定義嚴格對拆箱Double# s到jtobin的版本的主要區別在於:在testAffineTransformSpeed的循環中,您將使用盒裝的Double s作爲參數進行一次初始迭代,最後,結果的組件將被裝箱,這會增加一些常量開銷(每個循環的大約5納秒)。循環的主要部分在兩種情況下都採用Int#和兩個Double#參數,並且除了達到n = 0時的裝箱外,循環體是相同的。

當然,通過使用unboxed嚴格配對類型迫使組件立即評估更好。

+0

什麼時候「!(!x,!y)'有用? – is7s

+0

在'let'或'where'的綁定中。但是這裏只是從'!res @(!x,!y)'中刪除一個字符太少而已。感謝您的注意。 –

9

我定義的以下嚴格/裝箱對類型:

import System.Random.MWC -- for later 
import Control.DeepSeq 

data SP = SP { 
    one :: {-# UNPACK #-} !Double 
    , two :: {-# UNPACK #-} !Double 
    } deriving Show 

instance NFData SP where 
    rnf p = rnf (one p) `seq` rnf (two p) `seq`() 

並在runAffineTransform函數代替它:

runAffineTransform2 :: AffineTransform -> SP -> SP 
runAffineTransform2 affTr !(SP x y) = 
    SP ( get affTr `U.unsafeIndex` 0 * x 
     + get affTr `U.unsafeIndex` 1 * y 
     + get affTr `U.unsafeIndex` 2 ) 

    ( get affTr `U.unsafeIndex` 3 * x 
     + get affTr `U.unsafeIndex` 4 * y 
     + get affTr `U.unsafeIndex` 5 ) 
{-# INLINE runAffineTransform2 #-} 

然後跑這個基準套件:

main :: IO() 
main = do 
    g <- create 
    zs <- fmap (AffineTransform . U.fromList) 
      (replicateM 100000 (uniformR (0 :: Double, 1) g)) 

    let myConfig = defaultConfig { cfgPerformGC = ljust True } 

    defaultMainWith myConfig (return()) [ 
     bench "yours" $ nf (testAffineTransformSpeed zs) 10 
    , bench "mine" $ nf (testAffineTransformSpeed2 zs) 10 
    ] 

編譯與-O2並跑,並觀察到一些(~4倍)加速:

benchmarking yours 
mean: 257.4559 ns, lb 256.2492 ns, ub 258.9761 ns, ci 0.950 
std dev: 6.889905 ns, lb 5.688330 ns, ub 8.839753 ns, ci 0.950 
found 5 outliers among 100 samples (5.0%) 
    3 (3.0%) high mild 
    2 (2.0%) high severe 
variance introduced by outliers: 20.944% 
variance is moderately inflated by outliers 

benchmarking mine 
mean: 69.56408 ns, lb 69.29910 ns, ub 69.86838 ns, ci 0.950 
std dev: 1.448874 ns, lb 1.261444 ns, ub 1.718074 ns, ci 0.950 
found 4 outliers among 100 samples (4.0%) 
    4 (4.0%) high mild 
variance introduced by outliers: 14.190% 
variance is moderately inflated by outliers 

Full code is in gist here

編輯

我也貼標準的輸出報告here

+1

不錯,現在在我的電腦上它的性能略好於Scala(Java)。什麼是要學的教訓?嚴格標註是不夠的(他們不會自動取消裝箱?)?有時你必須手動創建無箱(解壓縮)數據結構? – Aurimas

+1

@ user2705843幻燈片(4)和(9)與此處相關:http://johantibell.com/files/haskell-performance-patterns.html – jtobin

+2

「SP」的NFData實例完全是浪費工作。事實上,這其中很多。它隱式地創建'Double'構造函數來指向'SP'中的解壓縮值,強制它們,然後拋棄它們。 'rnf a = seq a()'會更加高效和正確。 – Carl