Scala DataFrame：分解數組

我在Scala中使用了Spark類庫。我創建使用Scala DataFrame：分解數組

val searchArr = Array(
    StructField("log",IntegerType,true), 
    StructField("user", StructType(Array(
    StructField("date",StringType,true), 
    StructField("ua",StringType,true), 
    StructField("ui",LongType,true))),true), 
    StructField("what",StructType(Array(
    StructField("q1",ArrayType(IntegerType, true),true), 
    StructField("q2",ArrayType(IntegerType, true),true), 
    StructField("sid",StringType,true), 
    StructField("url",StringType,true))),true), 
    StructField("where",StructType(Array(
    StructField("o1",IntegerType,true), 
    StructField("o2",IntegerType,true))),true) 
) 

val searchSt = new StructType(searchArr)  

val searchData = sqlContext.jsonFile(searchPath, searchSt)

我現在是一個數據幀什麼爆炸現場what.q1，它應該包含整數數組，但文檔是有限的： http://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/DataFrame.html#explode(java.lang.String,%20java.lang.String,%20scala.Function1,%20scala.reflect.api.TypeTags.TypeTag)

到目前爲止，我嘗試了幾件沒有太多運氣的東西

val searchSplit = searchData.explode("q1", "rb")(q1 => q1.getList[Int](0).toArray())

任何想法/如何使用數組爆炸的例子？

來源

2015-06-30 Jaume Primer

你試過用UDF字段「what」？類似的東西可能是有用的：

val explode = udf { 
(aStr: GenericRowWithSchema) => 
    aStr match { 
     case null => "" 
     case _ => aStr.getList(0).get(0).toString() 
    } 
} 


val newDF = df.withColumn("newColumn", explode(col("what")))

其中：

的GetList（0）返回「Q1」字段
GET（0）返回「Q1」的第一個元素

我不確定，但您可以嘗試使用getAs [T]（fieldName：String）而不是getList（index：Int）。

來源

2016-11-09 13:59:10 pheeleeppoo

Scala DataFrame：分解數組

回答

相關問題