0
我有以下數據的.txt文件:文本挖掘與斯卡拉
L666371 +++$+++ u9030 +++$+++ m616 +++$+++ DURNFORD +++$+++ Lord Chelmsford seems to want me to stay back with my Basutos.
L666370 +++$+++ u9034 +++$+++ m616 +++$+++ VEREKER +++$+++ I'm to take the Sikali with the main column to the river
L666369 +++$+++ u9030 +++$+++ m616 +++$+++ DURNFORD +++$+++ Your orders, Mr Vereker?
L666257 +++$+++ u9030 +++$+++ m616 +++$+++ DURNFORD +++$+++ Good ones, yes, Mr Vereker. Gentlemen who can ride and shoot
L666256 +++$+++ u9034 +++$+++ m616 +++$+++ VEREKER +++$+++ Colonel Durnford... William Vereker. I hear you 've been seeking Officers?
我想要導入的文本文件導入斯卡拉(我做了),然後通過提取所有有關它的工作文本。之後:標記,小寫,忽略單詞形式,單獨標點符號,之後我想要學習單詞的計數,如下所示:unigram,bigram和trigram count,以最高計數排序結果。
有人可以告訴我怎麼實現嗎?我有以下的嘗試,但它似乎並不奏效:
import io.Source
val s = Source.fromFile("movie_lines.txt")("ISO-8859-1")
val lines = s.getLines
val str = s.mkString
val Pattern = "([A-Z]+.!)".r`enter code here`
Pattern.findAllIn(str).foreach { x => println(x) }
println ("\n This is the result\n")`enter code here`
}
任何人都可以回答? – 2015-02-22 05:12:59