2017-09-22 343 views
0
CASSANDRA_TABLE has (some_other_column, itemid) as primary key. 

val cassandraRdd: CassandraTableScanRDD[CassandraRow] = sparkSession.sparkContext 
    .cassandraTable(cassandraKeyspace, cassandraTable) 

cassandraRdd.take(10).foreach(println) 

這cassandraRdd已經從我的卡珊德拉表星火CassandraTableScanRDD KeyBy不持有的所有列

val temp1: CassandraTableScanRDD[((String), CassandraRow)] = cassandraRdd 
    .select("itemid", "column2", "column3") 
    .keyBy[(String)]("itemid") 
val temp2: CassandraTableScanRDD[((String), CassandraRow)] = cassandraRdd 
    .keyBy[(String)]("itemid") 
temp1.take(10).foreach(println) 
temp2.take(10).foreach(println) 

兩個temp1中和TEMP2不保留所有列是keyBy操作

((988230014),CassandraRow{itemid: 988230014}) 

後讀取的所有列我如何在特定列上鍵入並讓CassandraRow保留所有列?

回答

0

要保留分區,並獲得選擇的行我要讀卡桑德拉行像下面這樣

val cassandraRdd: CassandraTableScanRDD[((String, String), (String, String, String))] = { 
    sparkSession.sparkContext 
    .cassandraTable[(String, String, String)](cassandraKeyspace, cassandraTable) 
    .select("some_other_column" as "_1", "itemid" as "_2", "column3" as "_3", "some_other_column", "itemid") 
    .keyBy[(String, String)]("some_other_column", "itemid") 
}