如何從Spark中的文本文件中獲取第二個字段（使用scala）

我想將我的MapReduce代碼轉換爲使用Scala的spark。並且無法從逗號分隔的輸入中提取第二個字段。我嘗試了多個選項，但沒有成功運行。它編譯OK，但拋出一個運行時異常：scala.MatchError: MapPartitionsRDD[2]如何從Spark中的文本文件中獲取第二個字段（使用scala）

任何暗示會有所幫助：

輸入：

Australia,6,2,7690,15,1,1,0,0,3,1,0,1,0,1,0,0,blue,0,1,1,1,6,0,0,0,0,0,white,blue 
Austria,3,1,84,8,4,0,0,3,2,1,0,0,0,1,0,0,red,0,0,0,0,0,0,0,0,0,0,red,red 
Bahamas,1,4,19,0,1,1,0,3,3,0,0,1,1,0,1,0,blue,0,0,0,0,0,0,1,0,0,0,blue,blue

映射：

public void map(Object key, Text value, Context context) throws IOException, InterruptedException { 
      String [] flags = value.toString().split(","); 
      switch(flags[1]){ 
      case "1": landmass.set("N.America"); break; 
      case "2": landmass.set("S.America"); break; 
      case "3": landmass.set("Europe"); break; 
      case "4": landmass.set("Africa"); break; 
      case "5": landmass.set("Asia"); break; 
      case "6": landmass.set("Oceania");break; 
      } 
      context.write(landmass, new Text(flags[0])); 
     }

火花（斯卡拉）：

object countriesByLandmass { 
    def main(args: Array[String]) { 

    val inputFile = "Data\\country\\flag.data" 

    val conf = new SparkConf().setAppName("Total Countries by Landmass").setMaster("local") 
    val sc = new SparkContext(conf) 

    val txtFileLines = sc.textFile(inputFile).cache() 

    //val fields = txtFileLines.flatMap(_.split(",")) 

    // fields.foreach(value => println(value)) 

    val fields = txtFileLines.map(_.split(",")(1)) 

    val landmass = fields.toString() match { 
    case "1" => "N.America" 
    case "2" => "S.America" 
    case "3" => "Europe" 
    case "4" => "Africa" 
    case "5" => "Asia" 
    case "6" => "Oceania" 
    } 

    println(landmass) 
    } 
}

錯誤：

Exception in thread "main" scala.MatchError: MapPartitionsRDD[2] at map at countriesByLandmass.scala:22 (of class java.lang.String) 
    at com.country.countriesByLandmass$.main(countriesByLandmass.scala:24) 
    at com.country.countriesByLandmass.main(countriesByLandmass.scala)

[增訂]解決方案：

val fields = txtFileLines.map(_.split(",")(1)).foreach{ lm_code => 
    val landmass = lm_code match { 
    case "1" => "N.America" 
    case "2" => "S.America" 
    case "3" => "Europe" 
    case "4" => "Africa" 
    case "5" => "Asia" 
    case "6" => "Oceania" 
    case _ => "Invalid Code" 
    } 
    println(lm_code + " --> " + landmass) 
    }

來源

2016-05-31 Ronak Patel

的** ** fields.toString你RDD對象的對象名稱，因爲它不具有的一個匹配你的六個選項，它是拋出錯誤。請參閱我的答案瞭解更多詳情，以及如何解決此問題。 – RojoSam

編輯：斯卡拉代碼與答案 –

如果你以這種方式編輯你的問題，沒有人會明白是什麼問題。請保留原始問題。訪客將使用投票找到最佳答案。 – RojoSam

您這裏有兩個問題

值字段是一個RDD，包含數據中所有第二列的火花集合。然後你需要在一個Map中轉換你的數據。

val fields = txtFileLines.map(_.split(",")(1)).map { col2 => 
    val landmass = col2 match { 
    case "1" => "N.America" 
    case "2" => "S.America" 
    case "3" => "Europe" 
    case "4" => "Africa" 
    case "5" => "Asia" 
    case "6" => "Oceania" 
    } 
    println(landmass) 
}

在Java中，如果你鴕鳥政策指定默認情況下，輸入doesn't配以一個選項，然後輸入被忽略，不被沒有執行代碼例外。

在Scala中，如果輸入不匹配選項，則會拋出錯誤。 MatchError更準確。

爲了避免這種情況，你可以指定在後續的方式默認值：

val landmass = col2 match { 
    case "1" => "N.America" 
    case "2" => "S.America" 
    case "3" => "Europe" 
    case "4" => "Africa" 
    case "5" => "Asia" 
    case "6" => "Oceania" 
    case _ => "Invalid option" 
}

來源

2016-05-31 20:28:02 RojoSam

添加默認情況下解決了MatchError，我不得不使用foreach而不是地圖 'val fields = txtFileLines.map（_。split（「，」）（1））。foreach {lm_code => val landmass = lm_code match {。。。' –

如果答案幫助您解決問題，請接受答案。 – RojoSam

如何從Spark中的文本文件中獲取第二個字段（使用scala）

回答

相關問題