0
我試圖循環訪問文本文件的RDD,並對文件中的每個唯一字進行計數,然後累積每個唯一字後面的所有單詞以及它們的計數。到目前爲止,這是我所:如何使用3個值來減少鍵值?
// connecting to spark driver
val conf = new SparkConf().setAppName("WordStats").setMaster("local")
val spark = new SparkContext(conf) //Creates a new SparkContext object
//Loads the specified file into an RDD
val lines = sparkContext.textFile(System.getProperty("user.dir") + "/" + "basketball_words_only.txt")
//Splits the file into individual words
val words = lines.flatMap(line => {
val wordList = line.split(" ")
for {i <- 0 until wordList.length - 1}
yield (wordList(i), wordList(i + 1), 1)
})
如果我沒有明確迄今爲止,我所要做的是積累了一套遵循每個單詞的詞文件,用的次數沿所述詞語按照他們的前述詞語的形式:
(PrecedingWord,(FollowingWord,numberOfTimesWordFollows))
其數據類型是 (字符串,(字符串,整數))