2017-09-15 51 views
1

如何映射轉換成下面給出的鍵值對

test: Array[scala.collection.immutable.Map[String,Any]] = Array(
Map(_c3 -> "foobar", _c5 -> "impt", _c0 -> Key1, _c4 -> 20.0, _c1 -> "next", _c2 -> 1.0), 
Map(_c3 -> "high", _c5 -> "low", _c0 -> Key2, _c4 -> 19.0, _c1 -> "great", _c2 -> 0.0), 
Map(_c3 -> "book", _c5 -> "game", _c0 -> Key3, _c4 -> 42.0, _c1 -> "name", _c2 -> 0.5) 
) 

我怎麼能基於_c0只包括Strings把它轉換爲Key Value雙? 像下面

Key1 foobar 
Key1 impt 
Key1 next 
Key2 high 
Key2 low 
Key2 great 
Key3 book 
Key3 game 
Key3 name 

回答

0

請檢查了這一點

test.map(
    _.filter(!_._2.toString.matches("[+-]?\\d+.?\\d+")) 
).flatMap(
    data => 
     { 
     val key = data.getOrElse("_c0", "key_not_found") 
     data 
      .filter(_._1 != "_c0") 
      .map(
      key +" "+_._2.toString() 
     ) 
     } 
) 
0

嘗試此方法

import org.apache.spark.sql.functions._ 

# first extract all values which are string 
val rdd = sc.parallelize(test).map(x => (x.getOrElse("_c0","no key").toString -> (x - "_c0").values.filter(_.isInstanceOf[String]).asInstanceOf[List[String]])) 


val df = spark.createDataFrame(rdd).toDF("key", "vals") 

# use explode function to add new rows 
df.withColumn("vals", explode(col("vals"))).show() 
0

如何:

test 
.map(row => row.getOrElse(_c0, "") -> (row - _c0).values.filter(_.isInstanceOf[String])) 
.flatMap { case (key, innerList) => innerList.map(key -> _) }