0
我有一個文件象下面這樣:提取號碼和可變它們存儲在Scala和星火
0; best wrap ear market pair pair break make
1; time sennheiser product better earphone fit
1; recommend headphone pretty decent full sound earbud design
0; originally buy work gym work well robust sound quality good clip
1; terrific sound great fit toss mine profuse sweater headphone
0; negative experienced sit chair back touch chair earplug displace hurt
...
,我想提取數,並將其存儲在每個文檔,我已經試過:
var grouped_with_wt = data.flatMap({ (line) =>
val words = line.split(";").split(" ")
words.map(w => {
val a =
(line.hashCode(),(vocab_lookup.value(w), a))
})
}).groupByKey()
預期輸出是:
生成我用他們在這個代碼,以生成最終結果上述結果後(1453543,(best,0),(wrap,0),(ear,0),(market,0),(pair,0),(break,0),(make,0))
(3942334,(time,1),(sennheiser,1),(product,1),(better,1),(earphone,1),(fit,1))
...
:
val Beta = DenseMatrix.zeros[Int](V, S)
val Beta_c = grouped_with_wt.flatMap(kv => {
kv._2.map(wt => {
Beta(wt._1,wt._2) +=1
})
})
最終結果:
1 0
1 0
1 0
1 0
...
此代碼不能很好地工作,任何人可以幫助我嗎?我想要一個類似上面的代碼。
謝謝您的回覆,而不是使用地圖flatMap,比如在我的問題的代碼是否有可能?因爲我想在我的程序中使用這個flatmapped RDD。 – Rozita