2016-03-17 25 views
4

我已閱讀過其他SO帖子中的這個問題,但我仍然不知道自己做錯了什麼。原則上,加入這兩行:value toDF不是org.apache.spark.rdd.RDD的成員

val sqlContext = new org.apache.spark.sql.SQLContext(sc) 
import sqlContext.implicits._ 

應該做的伎倆,但錯誤依然存在

這是我build.sbt:

name := "PickACustomer" 

version := "1.0" 

scalaVersion := "2.11.7" 


libraryDependencies ++= Seq("com.databricks" %% "spark-avro" % "2.0.1", 
"org.apache.spark" %% "spark-sql" % "1.6.0", 
"org.apache.spark" %% "spark-core" % "1.6.0") 

和我的Scala代碼是:

import scala.collection.mutable.Map 
import scala.collection.immutable.Vector 

import org.apache.spark.rdd.RDD 
import org.apache.spark.SparkContext 
import org.apache.spark.SparkContext._ 
import org.apache.spark.SparkConf 
import org.apache.spark.sql._ 


    object Foo{ 

    def reshuffle_rdd(rawText: RDD[String]): RDD[Map[String, (Vector[(Double, Double, String)], Map[String, Double])]] = {...} 

    def do_prediction(shuffled:RDD[Map[String, (Vector[(Double, Double, String)], Map[String, Double])]], prediction:(Vector[(Double, Double, String)] => Map[String, Double])) : RDD[Map[String, Double]] = {...} 

    def get_match_rate_from_results(results : RDD[Map[String, Double]]) : Map[String, Double] = {...} 


    def retrieve_duid(element: Map[String,(Vector[(Double, Double, String)], Map[String,Double])]): Double = {...} 




    def main(args: Array[String]){ 
     val conf = new SparkConf().setAppName(this.getClass.getSimpleName) 
     if (!conf.getOption("spark.master").isDefined) conf.setMaster("local") 

     val sc = new SparkContext(conf) 

     //This should do the trick 
     val sqlContext = new org.apache.spark.sql.SQLContext(sc) 
     import sqlContext.implicits._ 

     val PATH_FILE = "/mnt/fast_export_file_clean.csv" 
     val rawText = sc.textFile(PATH_FILE) 
     val shuffled = reshuffle_rdd(rawText) 

     // PREDICT AS A FUNCTION OF THE LAST SEEN UID 
     val results = do_prediction(shuffled.filter(x => retrieve_duid(x) > 1) , predict_as_last_uid) 
     results.cache() 

     case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double) 

     val summary = results.map(x => Summary(x("match"), x("t_to_last"), x("nflips"), x("d_uid"), x("truth"), x("guess"))) 


     //PROBLEMATIC LINE 
     val sum_df = summary.toDF() 

    } 
    } 

我總是得到:

值toDF不是org.apache.spark.rdd.RDD [總結]

現在位丟失的成員。有任何想法嗎?

+1

可你至少鍵入你的價值觀,給我們的使用的方法的定義是什麼? – eliasah

+0

嘗試定義你的'案例類總結'主 – drstein

+0

@eliasah,對不起,有點新的scala,沒有意識到這將有所幫助。請參閱編輯。 – elelias

回答

7

移動你的情況下,類main外:

object Foo { 

    case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double) 

    def main(args: Array[String]){ 
    ... 
    } 

} 

一些關於它的作用域是防止火花從能夠處理模式的自動推導Summary。 FYI我實際上是從sbt得到了不同的錯誤:

沒有TypeTag可供摘要

+0

哇,那DID工作!非常感謝 – elelias

+1

如果它可行,你可以接受答案:) – eliasah

+1

是的,謝謝你的幫助@eliasah – elelias

0

很大。救我的命

移動你的情況下,類主外:

object Foo { 

    case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double) 

    def main(args: Array[String]){ 
... 
    } 
} 
相關問題