將cassandra行RDD轉換爲元組數組

我試圖從cassandra表讀取數據並將其存儲在數組中。我的RDD看起來像下面將cassandra行RDD轉換爲元組數組

列：com.datastax.spark.connector.rdd.CassandraTableScanRDD [com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD [3]在RDD在CassandraRDD.scala：15

如何將這些值存儲到沒有列名稱的數組中？

來源

2015-11-05 Ram

可以使用iterator其次toArray：

import org.apache.spark.rdd.RDD 

val arrayRDD: RDD[Array[AnyRef]] = rdd.map(_.iterator.toArray)

或columnValues方法：

val arrayRDD: RDD[IndexedSeq[AnyRef]] = rdd.map(_.columnValues)

但總體來講是相當無用的，除非你有一個Array[AnyRef]任何用途。

在實踐中，使用類型感知的獲取方法比如getInt，getString更有意義。如果你想提取數據是同質的，你可以映射在指數漲幅列名：

val cols: Array[String] = ??? // Array of column names of the same type 
rdd.map(row => cols.map(row.getString(_)))

或

val colsIdxs: Array[Int] = ??? // Array of column indices of the same type 
rdd.map(row => colsIdxs.map(row.getString(_)))

如果你想提取異類值，可以如上使用的元組用相同的getter方法。

來源

2015-11-05 20:28:41 zero323

將cassandra行RDD轉換爲元組數組

回答

相關問題