Scala的解析器組合，大文件的問題

我寫了一個解析器如下：Scala的解析器組合，大文件的問題

class LogParser extends JavaTokenParsers { 

    def invertedIndex: Parser[Array[Array[(Int, Int)]]] = { 
    num ~> num ~> num ~> rep(postingsList) ^^ { 
     _.toArray 
    } 
    } 

    def postingsList: Parser[Array[(Int, Int)]] = { 
    num ~> rep(entry) ^^ { 
     _.toArray 
    } 
    } 

    def entry = { 
    num ~ "," ~ num ^^ { 
     case docID ~ "," ~ count => (docID.toInt, count.toInt) 
    } 
    } 

    def num = wholeNumber ^^ (_.toInt) 

}

如果我從（270MB）解析文件用的FileReader如下：

val index = parseAll(invertedIndex, new FileReader("path/to/file")).get

我得到一個Exception in thread "main" java.lang.StackOverflowError（我也嘗試包裹在BufferedReader），但我可以通過先讀文件到像這樣一個String修復：

val input = io.Source.fromFile("path/to/file") 
val str = input.mkString 
input.close() 
val index = parseAll(invertedIndex, str).get

這是爲什麼？有沒有什麼辦法可以避免首先將它作爲字符串讀取，這似乎是一種浪費？

來源

2012-11-03 Robert

什麼是你堆的電流的大小，以及你有多少大，使你的籌碼，以避免StackOverflowException？堆棧需要多少才能使String版本溢出？（你可以通過啓動如下設置你的堆棧爲16MB：'scala -J-Xss16M'） – DaoWen

我只是使用默認堆棧大小，但是當我將它設置爲16M時，程序仍然在30分鐘後運行... – Robert

This可能與Scala 2.9.2錯誤[SI-6520]（https://issues.scala-lang.org/browse/SI-6520）有關。 –

還有另一個庫[1]，它很像支持Trampolining的scala解析器組合器，它是您需要停止計算器錯誤的東西。

[1] https://github.com/djspiewak/gll-combinators

來源

2012-11-16 02:14:01

Scala的解析器組合，大文件的問題

回答

相關問題