我想將我的MapReduce代碼轉換爲使用Scala的spark。並且無法從逗號分隔的輸入中提取第二個字段。我嘗試了多個選項,但沒有成功運行。它編譯OK,但拋出一個運行時異常:scala.MatchError: MapPartitionsRDD[2]
如何從Spark中的文本文件中獲取第二個字段(使用scala)
任何暗示會有所幫助:
輸入:
Australia,6,2,7690,15,1,1,0,0,3,1,0,1,0,1,0,0,blue,0,1,1,1,6,0,0,0,0,0,white,blue
Austria,3,1,84,8,4,0,0,3,2,1,0,0,0,1,0,0,red,0,0,0,0,0,0,0,0,0,0,red,red
Bahamas,1,4,19,0,1,1,0,3,3,0,0,1,1,0,1,0,blue,0,0,0,0,0,0,1,0,0,0,blue,blue
映射:
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String [] flags = value.toString().split(",");
switch(flags[1]){
case "1": landmass.set("N.America"); break;
case "2": landmass.set("S.America"); break;
case "3": landmass.set("Europe"); break;
case "4": landmass.set("Africa"); break;
case "5": landmass.set("Asia"); break;
case "6": landmass.set("Oceania");break;
}
context.write(landmass, new Text(flags[0]));
}
火花(斯卡拉):
object countriesByLandmass {
def main(args: Array[String]) {
val inputFile = "Data\\country\\flag.data"
val conf = new SparkConf().setAppName("Total Countries by Landmass").setMaster("local")
val sc = new SparkContext(conf)
val txtFileLines = sc.textFile(inputFile).cache()
//val fields = txtFileLines.flatMap(_.split(","))
// fields.foreach(value => println(value))
val fields = txtFileLines.map(_.split(",")(1))
val landmass = fields.toString() match {
case "1" => "N.America"
case "2" => "S.America"
case "3" => "Europe"
case "4" => "Africa"
case "5" => "Asia"
case "6" => "Oceania"
}
println(landmass)
}
}
錯誤:
Exception in thread "main" scala.MatchError: MapPartitionsRDD[2] at map at countriesByLandmass.scala:22 (of class java.lang.String)
at com.country.countriesByLandmass$.main(countriesByLandmass.scala:24)
at com.country.countriesByLandmass.main(countriesByLandmass.scala)
[增訂]解決方案:
val fields = txtFileLines.map(_.split(",")(1)).foreach{ lm_code =>
val landmass = lm_code match {
case "1" => "N.America"
case "2" => "S.America"
case "3" => "Europe"
case "4" => "Africa"
case "5" => "Asia"
case "6" => "Oceania"
case _ => "Invalid Code"
}
println(lm_code + " --> " + landmass)
}
的** ** fields.toString你RDD對象的對象名稱,因爲它不具有的一個匹配你的六個選項,它是拋出錯誤。請參閱我的答案瞭解更多詳情,以及如何解決此問題。 – RojoSam
編輯:斯卡拉代碼與答案 –
如果你以這種方式編輯你的問題,沒有人會明白是什麼問題。請保留原始問題。訪客將使用投票找到最佳答案。 – RojoSam