2015-10-01 33 views
1

我試圖找出如何映射我從SQL檢索HiveContext轉移到PairRDDFunctions [字符串,矢量]對象,其中的字符串值是在名字列SchemaRDD對象schemaRDD和列(BytesIn,BytesOut,等...),其餘是載體。如何將SchemaRDD映射到PairRDD

回答

2

假設你有列: 「名稱」, 「bytesIn」, 「bytesOut」

val schemaRDD: SchemaRDD = ... 
val pairs: RDD[(String, (Long, Long)] = 
    schemaRDD.select("name", "bytesIn", "bytesOut").rdd.map { 
    case Row(name, bytesIn, bytesOut) => 
     name -> (bytesIn, bytesOut) 
    } 

// To import PairRDDFunctions via implicits 
import SparkContext._ 

pairs.groupByKey ... etc