3
我試圖從RDD過濾空值,但失敗。這裏是我的代碼:Spark&Scala - 無法過濾來自RDD的空值
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result])
val raw_hbaserdd = hBaseRDD.map{
kv => kv._2
}
val Ratings = raw_hbaseRDD.map {
result => val x = Bytes.toString(result.getValue(Bytes.toBytes("data"),Bytes.toBytes("user")))
val y = Bytes.toString(result.getValue(Bytes.toBytes("data"),Bytes.toBytes("item")))
val z = Bytes.toString(result.getValue(Bytes.toBytes("data"),Bytes.toBytes("rating")))
(x,y, z)
}
Ratings.filter (x => x._1 != null)
Ratings.foreach(println)
調試時,空值仍然出現篩選後:
(3359,1494,4)
(null,null,null)
(28574,1542,5)
(null,null,null)
(12062,1219,5)
(14068,1459,3)
任何更好的主意?
你做錯了。 Ratings.filter(x => x._1!= null).foreach(println)將起作用 – Knight71
'val filteredRatings = Ratings.filter(x => x._1!= null)'和'filteredRatings.foreach(println)' 。 –