2015-08-08 196 views
0

我有一個RDD RDD1與下面的模式:火花斯卡拉RDD

RDD[String, Array[String]] 

(姑且稱之爲RDD1

,我想,每行創建一個新的RDD RDD2作爲RDD[String,String]與鍵和值屬於RDD1

例如:

RDD1 =Array(("Fruit",("Orange","Apple","Peach")),("Shape",("Square","Rectangle")),("Mathematician",("Aryabhatt")))) 

我想要的輸出爲如下:

RDD2 = Array(("Fruit","Orange"),("Fruit","Apple"),("Fruit","Peach"),("Shape","Square"),("Shape","Rectangle"),("Mathematician","Aryabhatt")) 

有人可以幫我這段代碼?

我嘗試:

val R1 = RDD1.map(line => (line._1,line._2.split((",")))) 
val R2 = R1.map(line => line._2.foreach(ph => ph.map(line._1))) 

這給了我一個錯誤:

error: value map is not a member of Char

我明白,這是因爲地圖功能僅適用於RDDs,而不是每個string/char。請幫助我在Spark中使用嵌套函數。

回答

4

分解問題。

  1. ("Fruit",Array("Orange","Apple","Peach") - >Array(("Fruit", "Orange"), ("Fruit", "Apple"), ("Fruit", "Peach"))

def flattenLine(line: (String, Array[String])) = line._2.map(x => (line._1, x)

  • 應用該函數到您的RDD:
  • rdd1.flatMap(flattenLine)

    +0

    非常感謝:)做了工作:) –