Spark 1.6以空值爆炸

我有一個Dataframe，我試圖展平。作爲該過程的一部分，我想將其分解，所以如果我有一列數組，則將使用數組的每個值創建一個單獨的行。我知道我可以使用爆炸函數。但是，我有一個問題，該列包含空值，我使用火花1.6。下面是數據類型和我想要什麼的例子：
我的數據：Spark 1.6以空值爆炸

id | ListOfRficAction| RficActionAttachment 
_______________________________ 
1 | Luke   | [baseball, soccer] 
2 | Lucy   | null

，我想

id | ListOfRficAction| RficActionAttachment 
_______________________________ 
1 | Luke   | baseball 
1 | Luke   | soccer 
2 | Lucy   | null

我使用的Spark 1.6（所以我不能使用explode_outer功能），我嘗試使用爆炸，但我有以下錯誤：

scala.MatchError: [null] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)

我也嘗試：

df.withColumn("likes", explode(
    when(col("likes").isNotNull, col("likes")) 
    // If null explode an array<string> with a single null 
    .otherwise(array(lit(null).cast("string")))))

但我的DataFrame架構是一個很複雜（我有字符串和長），所以強制轉換功能不起作用。這裏是我的架構的一部分，我有錯誤：

|-- RficActionAttachment: array (nullable = true) 
| |-- element: struct (containsNull = true) 
| | |-- ActivityFileAutoUpdFlg: string (nullable = true) 
| | |-- ActivityFileDate: string (nullable = true) 
| | |-- ActivityFileDeferFlg: string (nullable = true) 
| | |-- ActivityFileDockReqFlg: string (nullable = true) 
| | |-- ActivityFileDockStatFlg: string (nullable = true) 
| | |-- ActivityFileExt: string (nullable = true) 
| | |-- ActivityFileName: string (nullable = true) 
| | |-- ActivityFileRev: string (nullable = true) 
| | |-- ActivityFileSize: long (nullable = true) 
| | |-- ActivityFileSrcPath: string (nullable = true) 
| | |-- ActivityFileSrcType: string (nullable = true) 
| | |-- ActivityId: string (nullable = true) 
| | |-- AttachmentId: string (nullable = true) 
| | |-- Comment: string (nullable = true)

用戶類拋出的異常：

org.apache.spark.sql.AnalysisException: cannot resolve 'CASE WHEN isnotnull(ListOfRficAction.RficAction.ListOfRficActionAttachment.RficActionAttachment) THEN ListOfRficAction.RficAction.ListOfRficActionAttachment.RficActionAttachment ELSE array(ListOfRficAction.RficAction.ListOfRficActionAttachment.RficActionAttachment)'

由於數據類型不匹配：THEN和ELSE表情都應該是相同類型或強制轉換到一種常見的類型;

想知道我能做些什麼嗎？

來源

2017-10-20 Mbula Guy Marcel

我的問題是不同的，因爲當我的模式 –

問題是情況下，當我不能使用的情況下不適用於我 –

首先全部替換null列中的值將爲array(null)，然後使用explode。在問題中使用例如數據框：

val df = Seq((1, "Luke", Array("baseball", "soccer")), (2, "Lucy", null)) 
    .toDF("id", "ListOfRficAction", "RficActionAttachment") 

df.withColumn("RficActionAttachment", 
    when($"RficActionAttachment".isNull, array(lit(null))) 
    .otherwise($"RficActionAttachment")) 
    .withColumn("RficActionAttachment", explode($"RficActionAttachment"))

這會給請求的結果：

+---+----------------+--------------------+ 
| id|ListOfRficAction|RficActionAttachment| 
+---+----------------+--------------------+ 
| 1|   Luke|   baseball| 
| 1|   Luke|    soccer| 
| 2|   Lucy|    null| 
+---+----------------+--------------------+

來源

2017-10-20 16:23:36 Shaido

謝謝你@Shaido的回答，但正如我所說我嘗試這一點，我仍然有同樣的錯誤：無法解析'CASE WHEN isnull（ListOfRficAction.RficAction.ListOfRficActionAttachment.RficActionAttachment）THEN array（null）ELSE ListOfRficAction .RficAction.ListOfRficActionAttachment.RficActionAttachment'也許是由於我的數據框架模式 –

@MbulaGuyMarcel數據框架架應該不重要，如果你有一個數組，上面應該可以工作。對代碼做了一個小小的更新，你可以再試一次嗎？ – Shaido

對不起，它不起作用 –

Spark 1.6以空值爆炸

回答

相關問題