2017-10-16 37 views
-1

假設我有以下日期時間列,如下所示。我想將字符串中的列轉換爲日期時間類型,這樣我就可以提取月份,日期和年份等。從Pyspark的字符串列中創建datetime

+---+------------+ 
|agg| datetime| 
+---+------------+ 
| A|1/2/17 12:00| 
| B|  null| 
| C|1/4/17 15:00| 
+---+------------+ 

我曾嘗試下面的下面的代碼,但在datetime列返回的值是空的,我不明白,在目前這種情況的原因。

df.select(df['datetime'].cast(DateType())).show() 

的,我也嘗試過這樣的代碼:

df = df.withColumn('datetime2', from_unixtime(unix_timestamp(df['datetime']), 'dd/MM/yy HH:mm')) 

但是,他們都產生這個數據幀:

+---+------------+---------+ 
|agg| datetime|datetime2| 
+---+------------+---------+ 
| A|1/2/17 12:00|  null| 
| B|  null |  null| 
| C|1/4/17 12:00|  null| 

我已經閱讀並在這個崗位指定試圖解決方案無效:PySpark dataframe convert unusual string format to Timestamp

+0

@ user8371915,你的推理不能證明你爲什麼貶低張貼。我花了5個小時在多個來源包括StackOverFlow找到解決方案無濟於事。您建議的發佈,我已經嘗試過,失敗了。 – MLhacker

+0

你好MLhacker,你是否介意接受它作爲答案,如果它的作品?努力贏得我的聲譽,謝謝! –

回答

1
// imports 
import org.apache.spark.sql.functions.{dayofmonth,from_unixtime,month, unix_timestamp, year} 

// Not sure if the datatype of the column is datetime or string 
// I assume the column might be string, do the conversion 
// created column datetime2 which is time stamp 
val df2 = df.withColumn("datetime2", from_unixtime(unix_timestamp(df("datetime"), "dd/MM/yy HH:mm"))) 

+---+------------+-------------------+ 
|agg| datetime|   datetime2| 
+---+------------+-------------------+ 
| A|1/2/17 12:00|2017-02-01 12:00:00| 
| B|  null|    null| 
| C|1/4/17 15:00|2017-04-01 15:00:00| 
+---+------------+-------------------+ 


//extract month, year, day information 
val df3 = df2.withColumn("month", month(df2("datetime2"))) 
    .withColumn("year", year(df2("datetime2"))) 
    .withColumn("day", dayofmonth(df2("datetime2"))) 
+---+------------+-------------------+-----+----+----+ 
|agg| datetime|   datetime2|month|year| day| 
+---+------------+-------------------+-----+----+----+ 
| A|1/2/17 12:00|2017-02-01 12:00:00| 2|2017| 1| 
| B|  null|    null| null|null|null| 
| C|1/4/17 15:00|2017-04-01 15:00:00| 4|2017| 1| 
+---+------------+-------------------+-----+----+----+ 

謝謝