我想加載包含兩個時間戳列的選項卡分隔文件,並生成一個計算列,這是一列之間的差異(以天爲單位)和當前的時間戳。我已經在RDD上應用registerTempTable()方法將其轉換爲SchemaRDD。之後,我幾乎碰到了牆壁,因爲所有後續的操作都依賴於這個已計算的字段。是否有可能在Apache Spark中使用當前時間戳在時間戳列上做日期差異?
這是我迄今所做的。謝謝您的幫助 !
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
val conf = new SparkConf().setMaster("local[2]").setAppName("CookieSummary")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.createSchemaRDD
case class CookieDates(CLPartnerSyncCreateDT: String, CookieSyncRequestDT: String)
val cookies = sc.textFile("/Users/shubhro/Documents/dataFiles/clean/worker1.01012015.1420081201_sub.tsv").map(_.split("\t")).map(p => CookieDates(p(0), p(1)))
cookies.registerTempTable("cookies")
val allCookies = sqlContext.sql("SELECT CAST(CLPartnerSyncCreateDT AS TIMESTAMP),CAST(CookieSyncRequestDT AS TIMESTAMP) FROM cookies")
allCookies.collect().foreach(println)