我使用的是Spark 2,您可以看到以下結果,您的問題與unix_timestamp或Spark版本無關,請檢查您的數據。
import org.apache.spark.sql.functions.unix_timestamp
val df2 = sc.parallelize(Seq(
(10, "date", "20170312020200"), (10, "date", "20170312050200"))
).toDF("id ", "somthing ", "datee")
df2.show()
val df3=df2.withColumn("datee", unix_timestamp($"datee", "yyyyMMddHHmmss").cast("timestamp"))
df3.show()
+---+---------+--------------+
|id |somthing | datee|
+---+---------+--------------+
| 10| date|20170312020200|
| 10| date|20170312050200|
+---+---------+--------------+
+---+---------+-------------------+
|id |somthing | datee|
+---+---------+-------------------+
| 10| date|2017-03-12 02:02:00|
| 10| date|2017-03-12 05:02:00|
+---+---------+-------------------+
import org.apache.spark.sql.functions.unix_timestamp
df2: org.apache.spark.sql.DataFrame = [id : int, somthing : string ... 1 more field]
df3: org.apache.spark.sql.DataFrame = [id : int, somthing : string ... 1 more field]
這很有道理,我的本地系統在格林威治標準時間+5.30和服務器在EDT。 – Himanshu