2017-02-14 53 views
9

我試圖從時間戳字段以毫秒(13位數字)得到unix時間,但目前它以秒爲單位(10位數)返回。unix_timestamp()是否可以在Apache Spark中返回以毫秒爲單位的unix時間?

scala> var df = Seq("2017-01-18 11:00:00.000", "2017-01-18 11:00:00.123", "2017-01-18 11:00:00.882", "2017-01-18 11:00:02.432").toDF() 
df: org.apache.spark.sql.DataFrame = [value: string] 

scala> df = df.selectExpr("value timeString", "cast(value as timestamp) time") 
df: org.apache.spark.sql.DataFrame = [timeString: string, time: timestamp] 


scala> df = df.withColumn("unix_time", unix_timestamp(df("time"))) 
df: org.apache.spark.sql.DataFrame = [timeString: string, time: timestamp ... 1 more field] 

scala> df.take(4) 
res63: Array[org.apache.spark.sql.Row] = Array(
[2017-01-18 11:00:00.000,2017-01-18 11:00:00.0,1484758800], 
[2017-01-18 11:00:00.123,2017-01-18 11:00:00.123,1484758800], 
[2017-01-18 11:00:00.882,2017-01-18 11:00:00.882,1484758800], 
[2017-01-18 11:00:02.432,2017-01-18 11:00:02.432,1484758802]) 

即使2017-01-18 11:00:00.1232017-01-18 11:00:00.000是不同的,我會得到相同的UNIX時間回1484758800

我缺少什麼?

回答

1

unix_timestamp()以秒爲單位返回unix時間戳。

時間戳的最後3位數字與毫秒字符串的最後3位數字(1.999sec = 1999 milliseconds)相同,因此只需取出timestamps字符串的最後3位數字並追加到毫秒字符串的末尾即可。

相關問題