2016-09-29 35 views
0

我有一個Spark斯卡拉數據框與嵌套結構:重命名Scala中嵌套元素星火據幀

|-- _History: struct (nullable = true) 
| |-- Article: array (nullable = true) 
| | |-- element: struct (containsNull = true) 
| | | |-- Id: string (nullable = true) 
| | | |-- Timestamp: long (nullable = true) 
| |-- Channel: struct (nullable = true) 
| | |-- <font><font>Cultura pop</font></font>: array (nullable = true) 
| | | |-- element: long (containsNull = true) 
| | |-- <font><font>Deportes</font></font>: array (nullable = true) 
| | | |-- element: long (containsNull = true) 

我試圖重新命名嵌套元素(如<font><font>Deportes</font></font>Deportes有沒有辦法做到這一點使用UDF或類似的東西

我試過以下,它不工作:?

var filterDF2 = filterDF 
    .withColumnRenamed("_History.Channel.<font><font>Deportes</font></font>", "_History.Channel.Deportes") 

回答

3

最簡單的方法是使用類型轉換與正確命名的模式字符串(或同等StructField定義)荷蘭國際集團:

val schema = """struct< 
    Article: array<struct<Id:string,Timestamp:bigint>>, 
    Channel: struct<Cultura: bigint, Deportes: array<bigint>>>""" 
df.withColumn("_History", $"_History".cast(schema)) 

您也可以與case類模擬這種:

import org.apache.spark.sql.Row 

case class ChannelRecord(Cultura: Option[Long], Deoprtes: Option[Seq[Long]]) 

val rename = udf((row: Row) => 
    ChannelRecord(Option(row.getLong(0)), Option(row.getSeq[Long](1)))) 

df.withColumn("_History", 
    struct($"_History.Article", rename($"_History.channel").alias("channel")))