2017-07-18 25 views

回答

1

如果你是在談論RDD[Array[Array[Int]]]在火花相當於Array[Array[Array[Int]]]斯卡拉,那麼你可以做以下

假設你有一個文本文件(/home/test.csv)有

0,1,2 
7,8,9 
18,19,5 

你可以做

scala> val data = sc.textFile("/home/test.csv") 
data: org.apache.spark.rdd.RDD[String] = /home/test.csv MapPartitionsRDD[4] at textFile at <console>:24 

scala> val array = data.map(line => line.split(",").map(x => Array(x.toInt))) 
array: org.apache.spark.rdd.RDD[Array[Array[Int]]] = MapPartitionsRDD[5] at map at <console>:26 

您可以一步有RDD[Array[Array[Array[Int]]]]它說,RDD的每個值是你想要的類型,那麼你可以使用wholeTextFile,因爲它讀取文件分爲tuple2(filename, texts in the file)

scala> val data = sc.wholeTextFiles("/home/test.csv") 
data: org.apache.spark.rdd.RDD[(String, String)] = /home/test.csv MapPartitionsRDD[3] at wholeTextFiles at <console>:24 

scala> val array = data.map(t2 => t2._2.split("\n").map(line => line.split(",").map(x => Array(x.toInt)))) 
array: org.apache.spark.rdd.RDD[Array[Array[Array[Int]]]] = MapPartitionsRDD[4] at map at <console>:26 
+0

非常感謝!它運作良好! – icecream

+0

感謝您的接受:)高興地幫助您 –

相關問題