目錄樹的寬度優先遍歷不是懶惰

我嘗試遍歷目錄樹。幼稚的深度優先遍歷似乎不會以懶惰的方式產生數據並且耗盡內存。接下來我嘗試了寬度優先的方法，它顯示了同樣的問題 - 它使用所有可用的內存然後崩潰。目錄樹的寬度優先遍歷不是懶惰

我的代碼是：

getFilePathBreadtFirst :: FilePath -> IO [FilePath] 
getFilePathBreadtFirst fp = do 
    fileinfo <- getInfo fp 
    res :: [FilePath] <- if isReadableDirectory fileinfo 
      then do 
       children <- getChildren fp 
       lower <- mapM getFilePathBreadtFirst children 
       return (children ++ concat lower) 
      else return [fp]  -- should only return the files? 
    return res 

getChildren :: FilePath -> IO [FilePath] 
getChildren path = do 
      names <- getUsefulContents path 
      let namesfull = map (path </>) names 
      return namesfull 

testBF fn = do -- crashes for /home/frank, does not go to swap 
    fps <- getFilePathBreadtFirst fn 
    putStrLn $ unlines fps

我認爲所有的代碼是線性或尾遞歸，和我預料的文件名的上市立即開始，但實際上並非如此。我的代碼中的錯誤和我的想法在哪裏？我在哪裏失去了懶惰的評價？

來源

2012-09-26 user855443

我會用三個獨立的技巧來解決你的問題。

招數一：使用pipes庫併發流與遍歷樹的文件名。
把戲2：使用StateT (Seq FilePath)變壓器來實現寬度優先遍歷。
技巧3：使用MaybeT變壓器來避免寫入循環和退出時的手動遞歸。

以下代碼將這三種技巧結合在一個monad變壓器堆棧中。

import Control.Monad 
import Control.Monad.Trans 
import Control.Monad.Trans.Maybe 
import Control.Monad.State.Lazy 
import Control.Pipe 
import Data.Sequence 
import System.FilePath.Posix 
import System.Directory 

loop :: (Monad m) => MaybeT m a -> m() 
loop = liftM (maybe() id) . runMaybeT . forever 

quit :: (Monad m) => MaybeT m a 
quit = mzero 

getUsefulContents :: FilePath -> IO [FilePath] 
getUsefulContents path 
    = fmap (filter (`notElem` [".", ".."])) $ getDirectoryContents path 

permissible :: FilePath -> IO Bool 
permissible file 
    = fmap (\p -> readable p && searchable p) $ getPermissions file 

traverseTree :: FilePath -> Producer FilePath IO() 
traverseTree path = (`evalStateT` empty) $ loop $ do 
    -- All code past this point uses the following monad transformer stack: 
    -- MaybeT (StateT (Seq FilePath) (Producer FilePath IO))() 
    let liftState = lift 
     liftPipe = lift . lift 
     liftIO = lift . lift . lift 
    liftState $ modify (|> path) 
    forever $ do 
     x <- liftState $ gets viewl 
     case x of 
      EmptyL -> quit 
      file :< s -> do 
       liftState $ put s 
       liftPipe $ yield file 
       p <- liftIO $ doesDirectoryExist file 
       when p $ do 
        names <- liftIO $ getUsefulContents file 
        -- allowedNames <- filterM permissible names 
        let namesfull = map (path </>) names 
        liftState $ forM_ namesfull $ \name -> modify (|> name)

這創建了一個廣度優先文件名的生成器，它可以與樹遍歷並行使用。

printer :: (Show a) => Consumer a IO r 
printer = forever $ do 
    a <- await 
    lift $ print a 

>>> runPipe $ printer <+< traverseTree path 
<Prints file names as it traverses the tree>

你甚至可以選擇不要求所有的值：您可以使用消耗值更重要的是

-- Demand only 'n' elements 
take' :: (Monad m) => Int -> Pipe a a m() 
take' n = replicateM_ n $ do 
    a <- await 
    yield a 

>> runPipe $ printer <+< take' 3 <+< traverseTree path 
<Prints only three files>

，那最後的例子將只遍歷樹高達必要產生三文件，然後它會停止。這可以防止浪費地遍歷整個樹，當你想要的只有3個結果！

要了解有關pipes庫技巧的更多信息，請參閱pipes tutorial，網址Control.Pipes.Tutorial。

要了解有關循環技巧的更多信息，請閱讀blog post。

我無法找到廣泛優先遍歷的隊列技巧的好鏈接，但我知道它在那裏。如果其他人知道這個好鏈接，只需編輯我的答案來添加它。

來源

2012-09-26 22:16:40

謝謝你的代碼。理解管道很有幫助。我正在閱讀有關管道並正在計劃使用它，但預計我應該首先有一個簡單的懶惰解決方案，只有樹遍歷。我試了一下，它的工作原理，但它沒有遞歸樹，我不明白它會在你的代碼遞歸。缺少的代碼正在過濾掉「。」。和「..」 getUsefulContents path = do names < - getDirectoryContents path return（filter（'notElem' [「。」，「..」]）names） – user855443

關於更深的檢查我看到了（隱藏）在liftstate的最後一行中遞歸，其中新文件名被添加到「todo」列表中。我沒有看到這個，因爲代碼不會爲添加的文件生成完整的文件路徑。路徑的值是原始起始值，並且不會每次都設置爲當前文件名 - >用文件替換路徑，然後運行。要完全工作，必須檢查目錄上的權限，我使用 getInfo :: FilePath - > IO信息我從現實世界中採取了哈斯克爾第9章。 – user855443

它遇到困難，當它遇到鏈接和我必須添加一個測試來濾除鏈接。它的工作原理和使用我所有的4核心！仍然存在內存泄漏，因爲使用量增長非常緩慢，直到耗盡內存。你能看到哪裏？你的幫助是非常感謝，正是我需要有一個很好的實例如何穿越樹時使用管道！ – user855443

我已經分開管道的管理和樹遍歷。這裏先對管道（！岡薩雷斯的基本代碼 - 謝謝）代碼：

traverseTree :: FilePath -> Producer FilePath IO() 
--^traverse a tree in breadth first fashion using an external doBF function 
traverseTree path = (`evalStateT` empty) $ loop $ do 
-- All code past this point uses the following monad transformer stack: 
-- MaybeT (StateT (Seq FilePath) (Producer FilePath IO))() 
let liftState = lift 
    liftPipe = lift . lift 
    liftIO = lift . lift . lift 
liftState $ modify (|> path) 
forever $ do 
    x <- liftState $ gets viewl 
    case x of 
     EmptyL -> quit 
     file :< s -> do 
      (yieldval, nextInputs) <- liftIO $ doBF file 
      liftState $ put s 
      liftPipe $ yield yieldval 
      liftState $ forM_ nextInputs $ \name -> modify (|> name)

旁邊的樹遍歷代碼：

doBF :: FilePath -> IO (FilePath, [FilePath]) 
doBF file = do 
    finfo <- getInfo file 
    let p = isReadableDirectoryNotLink finfo 
    namesRes <- if p then do 
     names :: [String] <- liftIO $ getUsefulContents file 
     let namesSorted = sort names 
     let namesfull = map (file </>) namesSorted 
     return namesfull 
     else return []   
    return (file, namesRes)

我希望有類似的功能，以取代DOBF首先遍歷深度。我認爲我可以使traverseTree更通用，而不僅僅是FilePath〜String，但是我沒有看到序列上的空函數在哪個類中。可能通常是有用的。

來源

2012-09-28 13:26:19 user855443

目錄樹的寬度優先遍歷不是懶惰

回答

相關問題