我在Haskell中編寫了一個程序,它必須以UTF8加載和解析大文本文件。該文件表示每行上具有鍵值對的字典。在我的程序中,我希望有一個Data.Map容器來進行快速字典搜索。我的文件大約是40MB,但在將其加載到我的程序後,使用了1.5 GB的RAM,並且從未釋放。我做錯了什麼?預計內存使用情況如何?爲什麼Haskell在處理字符串時會分配大量內存?
這裏是從我的程序代碼示例:
模塊主要其中
import Engine
import Codec.Archive.Zip
import Data.IORef
import System.IO
import System.Directory
import qualified System.IO.UTF8 as UTF8
import qualified Data.ByteString.Lazy as B
import qualified Data.ByteString.UTF8 as BsUtf
import qualified Data.Map as Map
import Graphics.UI.Gtk
import Graphics.UI.Gtk.Glade
maybeRead :: Read a => BsUtf.ByteString -> Maybe a
maybeRead s = case reads $ BsUtf.toString s of
[(x, "")] -> Just x
_ -> Nothing
parseToEntries :: [BsUtf.ByteString] -> [(BsUtf.ByteString, Int)]
parseToEntries [] = []
parseToEntries (x:xs) = let (key, svalue) = BsUtf.break (==':') x
value = maybeRead svalue
in case value of
Just x -> [(key, x)] ++ parseToEntries xs
Nothing -> parseToEntries xs
createDict :: BsUtf.ByteString -> IO (Map.Map BsUtf.ByteString Int)
createDict str = do
let entries = parseToEntries $ BsUtf.lines str
dict = Map.fromList entries
return (dict)
main :: IO()
main = do
currFileName <- newIORef ""
dictZipFile <- B.readFile "data.db"
extractFilesFromArchive [] $ toArchive dictZipFile
dictFile <- UTF8.readFile "dict.txt"
dict <- createDict $ BsUtf.fromString dictFile
...
searchAccent :: Map.Map BsUtf.ByteString Int -> String -> Int
searchAccent dict word = let sword = BsUtf.fromString $ map toLower word
entry = Map.lookup sword dict
in case entry of
Nothing -> -1
Just match -> 0
我在哈斯克爾有點生疏,但IIRC的''++語法是內存價格昂貴,其中的利弊操作符(':')便宜。是否有可能使用像'(key,x):parseToEntries xs'?再次。 。 。我的Haskell非常生鏽,所以這可能會失敗。 – jpm 2012-04-05 21:16:13
@jpm,它的內存昂貴取決於'++'的第一個參數的長度。在這種情況下,它不相關。 – 2012-04-05 21:21:42
@maxtaldykin啊,這很有道理。感謝您的澄清。 – jpm 2012-04-05 21:38:27