我試圖拿出使用Haskell Iteratee庫的「wc -l」的等價物。下面是「WC」(這只是計算的話 - 類似於上hackage iteratee例子的代碼)的代碼,運行速度非常快:使用Iteratee庫編寫「wc -l」 - 如何篩選換行符?
{-# LANGUAGE BangPatterns #-}
import Data.Iteratee as I
import Data.ListLike as LL
import Data.Iteratee.IO
import Data.ByteString
length1 :: (Monad m, Num a, LL.ListLike s el) => Iteratee s m a
length1 = liftI (step 0)
where
step !i (Chunk xs) = liftI (step $ i + fromIntegral (LL.length xs))
step !i stream = idone i stream
{-# INLINE length1 #-}
main = do
i' <- enumFile 1024 "/usr/share/dict/words" (length1 :: (Monad m) => Iteratee ByteString m Int)
result <- run i'
print result
{- Time measured on a linux x86 box:
$ time ./test ## above haskell compiled code
4950996
real 0m0.013s
user 0m0.004s
sys 0m0.007s
$ time wc -c /usr/share/dict/words
4950996 /usr/share/dict/words
real 0m0.003s
user 0m0.000s
sys 0m0.002s
-}
現在,怎麼辦你擴展它來計算太快運行的行數?我做了一個使用Prelude.filter的版本來過濾只有「\ n」的長度,但由於內存太多而導致它比linux「wc -l」慢,而gc(懶惰評估,我猜)。所以,我寫了使用Data.ListLike.filter另一個版本,但它不能編譯,因爲它沒有類型檢查 - 在這裏幫助,將不勝感激:如果你正在閱讀ByteString
塊
{-# LANGUAGE BangPatterns #-}
import Data.Iteratee as I
import Data.ListLike as LL
import Data.Iteratee.IO
import Data.ByteString
import Data.Char
import Data.ByteString.Char8 (pack)
numlines :: (Monad m, Num a, LL.ListLike s el) => Iteratee s m a
numlines = liftI $ step 0
where
step !i (Chunk xs) = liftI (step $i + fromIntegral (LL.length $ LL.filter (\x -> x == Data.ByteString.Char8.pack "\n") xs))
step !i stream = idone i stream
{-# INLINE numlines #-}
main = do
i' <- enumFile 1024 "/usr/share/dict/words" (numlines :: (Monad m) => Iteratee ByteString m Int)
result <- run i'
print result
謝謝你,約翰。非常有用的反饋。我的目的是瞭解如何使用基本構建塊來編寫它們,以便我能夠理解迭代器。您的反饋有助於瞭解如何編寫超出玩具代碼的代碼。 – Sal