2016-11-14 63 views
0

我有一個數據框有「CUSTOMER_MAILID」,「OFFER_NAME」,「OFFER_ISAPPLIED」列。如何更新基於其他列的PySpark中的列?

樣本數據:

+--------------------+--------------------+---------------+ 
|  CUSTOMER_MAILID|   OFFER_NAME|OFFER_ISAPPLIED| 
+--------------------+--------------------+---------------+ 
|pushpendrakaushik...|Jaipur Pink Panth...|    N| 
|pushpendrakaushik...|Jaipur Pink Panth...|    N| 
|[email protected]|     |    N| 
|spdadhichassociat...|     |    N| 
|[email protected]|Jaipur Pink Panth...|    N| 
|[email protected]|     |    N| 
| [email protected]|     |    N| 
|[email protected]|     |    N| 
| [email protected]|Jaipur Pink Panth...|    N| 

我想用 「Y」 更新 「OFFER_ISAPPLIED」 列值,如果 「OFFER_NAME」 列有一定價值的,除空。

我該如何實現它?

輸出應該是這樣的:

+--------------------+--------------------+---------------+ 
|  CUSTOMER_MAILID|   OFFER_NAME|OFFER_ISAPPLIED| 
+--------------------+--------------------+---------------+ 
|pushpendrakaushik...|Jaipur Pink Panth...|    Y| 
|pushpendrakaushik...|Jaipur Pink Panth...|    Y| 
|[email protected]|     |    N| 
|spdadhichassociat...|     |    N| 
|[email protected]|Jaipur Pink Panth...|    Y| 
|[email protected]|     |    N| 
| [email protected]|     |    N| 
|[email protected]|     |    N| 
| [email protected]|Jaipur Pink Panth...|    Y| 

回答

1

用途:

from pyspark.sql.functions import * 

df.withColum("OFFER_ISAPPLIED", 
    when(col("OFFER_NAME").isNull(), "N").otherwise("Y")) 
+0

它的工作....謝謝:) –