2017-08-15 20 views
1

我試着一列從字符串轉換與此代碼與pyspark.sql.functions UNIX_TIMESTAMP得到空

from pyspark.sql.functions import unix_timestamp 
(sc 
.parallelize([Row(dt='2017-01-23T08:12:39.929+01:00')]) 
.toDF() 
.withColumn("parsed", unix_timestamp("dt", "yyyy-MM-ddThh:mm:ss") 
.cast("double") 
.cast("timestamp")) 
.show(1, False)) 

爲timestamp,但我得到空

+-----------------------------+------+ 
|dt       |parsed| 
+-----------------------------+------+ 
|2017-01-23T08:12:39.929+01:00|null | 
+-----------------------------+------+ 

爲什麼呢?

回答

0

您得到NULL,因爲您使用的格式與數據不匹配。要獲得一個最小的比賽,你必須逃離T用單引號:

yyyy-MM-dd'T'kk:mm:ss 

和相匹配的完整模式,你需要爲S毫秒和X的時區:

但在當前星火版直cast

from pyspark.sql.functions import col 

col("dt").cast("timestamp") 

應該只是罰款:

spark.sql(
    """SELECT CAST("2011-01-23T08:12:39.929+01:00" AS timestamp)""" 
).show(1, False) 
+------------------------------------------------+ 
|CAST(2011-01-23T08:12:39.929+01:00 AS TIMESTAMP)| 
+------------------------------------------------+ 
|2011-01-23 08:12:39.929       | 
+------------------------------------------------+ 

參考:SimpleDateFormat