2016-09-30 67 views
2

我正在嘗試使用動態編程來計算兩個字符串之間的Levenshtein距離。這是通過Hackerrank完成的,所以我有時間限制。我用我看到的techenique:How are Dynamic Programming algorithms implemented in idiomatic Haskell?,它似乎在工作。不幸的是,它在一個測試案例中超時。我無法訪問特定的測試用例,因此我不知道輸入的確切大小。使用動態編程的Haskell性能

import Control.Monad 
import Data.Array.IArray 
import Data.Array.Unboxed 

main = do 
    n <- readLn 
    replicateM_ n $ do 
    s1 <- getLine 
    s2 <- getLine 
    print $ editDistance s1 s2 

editDistance :: String -> String -> Int 
editDistance s1 s2 = dynamic editDistance' (length s1, length s2) 
    where 
    s1' :: UArray Int Char 
    s1' = listArray (1,length s1) s1 
    s2' :: UArray Int Char 
    s2' = listArray (1,length s2) s2 
    editDistance' table (i,j) 
     | min i j == 0 = max i j 
     | otherwise = min' (table!((i-1),j) + 1) (table!(i,(j-1)) + 1) (table!((i-1),(j-1)) + cost) 
     where 
     cost = if s1'!i == s2'!j then 0 else 1 
     min' a b = min (min a b) 

dynamic :: (Array (Int,Int) Int -> (Int,Int) -> Int) -> (Int,Int) -> Int 
dynamic compute (xBnd, yBnd) = table!(xBnd,yBnd) 
    where 
    table = newTable $ map (\coord -> (coord, compute table coord)) [(x,y) | x<-[0..xBnd], y<-[0..yBnd]] 
    newTable xs = array ((0,0),fst (last xs)) xs 

我已經切換到使用數組,但加速不足。我無法使用Unboxed數組,因爲此代碼依賴於懶惰。我有沒有明顯的表現錯誤?或者我還能如何加快速度?

+1

字符串效率很低。對於一個競賽節目,如果你可以避開這個節目,我會把它作爲一個ByteString讀入 - 其他的使用Text。 – ErikR

回答

2

爲編輯距離計算的落後方程爲:

f(i, j) = minimum [ 
    1 + f(i + 1, j), -- delete from the 1st string 
    1 + f(i, j + 1), -- delete from the 2nd string 
    f(i + 1, j + 1) + if a(i) == b(j) then 0 else 1 -- substitute or match 
] 

所以每個維度中,你需要什麼比非常下一個指數更多:+ 1。這是一個連續的訪問模式,而不是隨機訪問要求數組;並且可以使用列表來實現,嵌套正確摺疊:

editDistance :: Eq a => [a] -> [a] -> Int 
editDistance a b = head . foldr loop [n, n - 1..0] $ zip a [m, m - 1..] 
    where 
    (m, n) = (length a, length b) 
    loop (s, l) lst = foldr go [l] $ zip3 b lst (tail lst) 
    where 
    go (t, i, j) [email protected](k:_) = inc `seq` inc:acc 
     where inc = minimum [i + 1, k + 1, if s == t then j else j + 1] 

您可以測試在Hackerrank Edit Distance Problem將該代碼作爲:

import Control.Applicative ((<$>)) 
import Control.Monad (replicateM_) 
import Text.Read (readMaybe) 

editDistance :: Eq a => [a] -> [a] -> Int 
editDistance a b = ... -- as implemented above 

main :: IO() 
main = do 
    Just n <- readMaybe <$> getLine 
    replicateM_ n $ do 
    a <- getLine 
    b <- getLine 
    print $ editDistance a b 

其通過與不俗的性能表現所有測試。