2017-03-14 31 views
1

我是Scala的新手,我在做一些練習時遇到了錯誤。在Scala中使用RDD操作轉換錯誤

我試圖將RDD轉換爲DataFrame,以下是我的代碼。

package com.sclee.examples 

import com.sun.org.apache.xalan.internal.xsltc.compiler.util.IntType 
import org.apache.spark.{SparkConf, SparkContext} 
import org.apache.spark.sql.Row 
import org.apache.spark.sql.types.{LongType, StringType, StructField, StructType}; 


object App { 
    def main(args: Array[String]): Unit = { 
    val conf = new SparkConf().setAppName("examples").setMaster("local") 
    val sc = new SparkContext(conf) 

    val sqlContext = new org.apache.spark.sql.SQLContext(sc) 
    import sqlContext.implicits._ 

    case class Person(name: String, age: Long) 

    val personRDD = sc.makeRDD(Seq(Person("A",10),Person("B",20))) 
    val df = personRDD.map({ 
     case Row(val1: String, val2: Long) => Person(val1,val2) 
    }).toDS() 

// val ds = personRDD.toDS() 
    } 
} 

我跟着星火文檔中的說明,也引用了一些博客,我展示瞭如何轉換成RDD數據框,但我得到了下面的錯誤。

Error:(20, 27) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._ Support for serializing other types will be added in future releases. 
    val df = personRDD.map({ 

雖然我試圖自己解決問題,但失敗了。任何幫助將不勝感激。

回答

1

下面的代碼工作:

import org.apache.spark.rdd.RDD 
import org.apache.spark.sql.SparkSession 

case class Person(name: String, age: Long) 
object SparkTest { 
    def main(args: Array[String]): Unit = { 

    // use the SparkSession of Spark 2 
    val spark = SparkSession 
     .builder() 
     .appName("Spark SQL basic example") 
     .config("spark.some.config.option", "some-value") 
     .getOrCreate() 

    import spark.implicits._ 

    // this your RDD - just a sample how to create an RDD 
    val personRDD: RDD[Person] = spark.sparkContext.parallelize(Seq(Person("A",10),Person("B",20))) 

    // the sparksession has a method to convert to an Dataset 
    val ds = spark.createDataset(personRDD) 
    println(ds.count()) 
    } 
} 

我做了以下修改:代替SparkContext

  • 使用SparkSessionSqlContext
  • 移動Person類出的App(我不是肯定爲什麼我不得不這樣做 )
  • 使用createDataset轉換

不過,我想這是非常罕見的做到這一點的轉換,你可能想直接使用read方法

讀你輸入到 Dataset
相關問題