2016-05-31 15 views
1

我想將我的MapReduce代碼轉換爲使用Scala的spark。並且無法從逗號分隔的輸入中提取第二個字段。我嘗試了多個選項,但沒有成功運行。它編譯OK,但拋出一個運行時異常:scala.MatchError: MapPartitionsRDD[2]如何從Spark中的文本文件中獲取第二個字段(使用scala)

任何暗示會有所幫助:

輸入:

Australia,6,2,7690,15,1,1,0,0,3,1,0,1,0,1,0,0,blue,0,1,1,1,6,0,0,0,0,0,white,blue 
Austria,3,1,84,8,4,0,0,3,2,1,0,0,0,1,0,0,red,0,0,0,0,0,0,0,0,0,0,red,red 
Bahamas,1,4,19,0,1,1,0,3,3,0,0,1,1,0,1,0,blue,0,0,0,0,0,0,1,0,0,0,blue,blue 

映射:

public void map(Object key, Text value, Context context) throws IOException, InterruptedException { 
      String [] flags = value.toString().split(","); 
      switch(flags[1]){ 
      case "1": landmass.set("N.America"); break; 
      case "2": landmass.set("S.America"); break; 
      case "3": landmass.set("Europe"); break; 
      case "4": landmass.set("Africa"); break; 
      case "5": landmass.set("Asia"); break; 
      case "6": landmass.set("Oceania");break; 
      } 
      context.write(landmass, new Text(flags[0])); 
     } 

火花(斯卡拉):

object countriesByLandmass { 
    def main(args: Array[String]) { 

    val inputFile = "Data\\country\\flag.data" 

    val conf = new SparkConf().setAppName("Total Countries by Landmass").setMaster("local") 
    val sc = new SparkContext(conf) 

    val txtFileLines = sc.textFile(inputFile).cache() 

    //val fields = txtFileLines.flatMap(_.split(",")) 

    // fields.foreach(value => println(value)) 

    val fields = txtFileLines.map(_.split(",")(1)) 

    val landmass = fields.toString() match { 
    case "1" => "N.America" 
    case "2" => "S.America" 
    case "3" => "Europe" 
    case "4" => "Africa" 
    case "5" => "Asia" 
    case "6" => "Oceania" 
    } 

    println(landmass) 
    } 
} 

錯誤:

Exception in thread "main" scala.MatchError: MapPartitionsRDD[2] at map at countriesByLandmass.scala:22 (of class java.lang.String) 
    at com.country.countriesByLandmass$.main(countriesByLandmass.scala:24) 
    at com.country.countriesByLandmass.main(countriesByLandmass.scala) 

[增訂]解決方案:

val fields = txtFileLines.map(_.split(",")(1)).foreach{ lm_code => 
    val landmass = lm_code match { 
    case "1" => "N.America" 
    case "2" => "S.America" 
    case "3" => "Europe" 
    case "4" => "Africa" 
    case "5" => "Asia" 
    case "6" => "Oceania" 
    case _ => "Invalid Code" 
    } 
    println(lm_code + " --> " + landmass) 
    } 
+0

的** ** fields.toString你RDD對象的對象名稱,因爲它不具有的一個匹配你的六個選項,它是拋出錯誤。請參閱我的答案瞭解更多詳情,以及如何解決此問題。 – RojoSam

+0

編輯:斯卡拉代碼與答案 –

+0

如果你以這種方式編輯你的問題,沒有人會明白是什麼問題。請保留原始問題。訪客將使用投票找到最佳答案。 – RojoSam

回答

3

您這裏有兩個問題

  1. 字段是一個RDD,包含數據中所有第二列的火花集合。然後你需要在一個Map中轉換你的數據。

    val fields = txtFileLines.map(_.split(",")(1)).map { col2 => 
        val landmass = col2 match { 
        case "1" => "N.America" 
        case "2" => "S.America" 
        case "3" => "Europe" 
        case "4" => "Africa" 
        case "5" => "Asia" 
        case "6" => "Oceania" 
        } 
        println(landmass) 
    } 
    
  2. 2.

在Java中,如果你鴕鳥政策指定默認情況下,輸入doesn't配以一個選項,然後輸入被忽略,不被沒有執行代碼例外。

在Scala中,如果輸入不匹配選項,則會拋出錯誤。 MatchError更準確。

爲了避免這種情況,你可以指定在後續的方式默認值:

val landmass = col2 match { 
    case "1" => "N.America" 
    case "2" => "S.America" 
    case "3" => "Europe" 
    case "4" => "Africa" 
    case "5" => "Asia" 
    case "6" => "Oceania" 
    case _ => "Invalid option" 
} 
+0

添加默認情況下解決了MatchError,我不得不使用foreach而不是地圖 'val fields = txtFileLines.map(_。split(「,」)(1))。foreach {lm_code => val landmass = lm_code match {。 。 。' –

+0

如果答案幫助您解決問題,請接受答案。 – RojoSam

相關問題