Spark 2.0.2似乎並不認爲「groupBy」正在返回一個DataFrame

這種感覺有點愚蠢，但我正在從Spark 1.6.1遷移到Spark 2.0.2。我正在使用Databrick CSV庫，現在正試圖使用內置的CSV DataFrameWriter。Spark 2.0.2似乎並不認爲「groupBy」正在返回一個DataFrame

下面的代碼：

// Get an SQLContext 
    val sqlContext = new SQLContext(sc) 
    import sqlContext.implicits._ 

    var sTS = lTimestampToSummarize.toString() 
    val sS3InputPath = "s3://measurements/" + sTS + "/*" 

    // Read all measurements - note that all subsequent ETLs will reuse dfRaw 
    val dfRaw = sqlContext.read.json(sS3InputPath) 

    // Filter just the user/segment timespent records 
    val dfSegments = dfRaw.filter("segment_ts <> 0").withColumn("views", lit(1)) 

    // Aggregate views and timespent per user/segment tuples 
    val dfUserSegments : DataFrame = dfSegments.groupBy("company_id", "division_id", "department_id", "course_id", "user_id", "segment_id") 
     .agg(sum("segment_ts").alias("segment_ts_sum"), sum("segment_est").alias("segment_est_sum"), sum("views").alias("segment_views")) 

    // The following will write CSV files to the S3 bucket 
    val sS3Output = "s3://output/" + sTS + "/usersegment/" 
    dfUserSegments.write.csv(sS3Output)

返回此錯誤：

[error] /home/Spark/src/main/scala/Example.scala:75: type mismatch; 
[error] found : Unit 
[error] required: org.apache.spark.sql.DataFrame 
[error]  (which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 
[error]   dfUserSegments.write.csv(sS3Output) 
[error]        ^
[error] one error found 
[error] (compile:compile) Compilation failed 
[error] Total time: 2 s, completed Jun 5, 2017 5:00:12 PM

我知道我必須要解釋的錯誤錯誤的，因爲我設置dfUserSegments明確是DataFrame，然而編譯器告訴我它是Unit（沒有類型）。

任何幫助表示讚賞。

來源

2017-06-05 rabinnh

您不顯示整個方法。我猜這是因爲方法返回類型是DataFrame，但是此方法中的最後一條語句是dfUserSegments.write.csv(sS3Output)，而csv的返回類型是Unit。

來源

2017-06-05 21:27:29 zsxwing

Spark 2.0.2似乎並不認爲「groupBy」正在返回一個DataFrame

回答

相關問題