在創建數據框時面對「scala.MatchError：1201（類java.lang.Integer）」

我正在執行以下代碼以從文本文件創建數據框。在創建數據框時面對「scala.MatchError：1201（類java.lang.Integer）」

import org.apache.spark.SparkContext 
import org.apache.spark.SparkConf 
import org.apache.spark.sql.{SQLContext, Row} 
import org.apache.spark.sql.types.{StructType, StringType, StructField} 


/** 
    * Created by PSwain on 6/19/2016. 
    */ 
object RddToDataframe extends App { 

    val scnf=new SparkConf().setAppName("RddToDataFrame").setMaster("local[1]") 
    val sc = new SparkContext(scnf) 
    val sqlContext = new SQLContext(sc) 

    val employeeRdd=sc.textFile("C:\\Users\\pswain\\IdeaProjects\\test1\\src\\main\\resources\\employee") 

    //Creating schema 

    val employeeSchemaString="id name age" 
    val schema = StructType(employeeSchemaString.split(",").map(colNmae => StructField(colNmae,StringType,true))) 

    //Creating RowRdd 
    val rowRdd= employeeRdd.map(row => row.split(",")).map(row => Row(row(0).trim.toInt,row(1),row(2).trim.toInt)) 

    //Creating dataframe = RDD[rowRdd] + schema 
    val employeeDF=sqlContext.createDataFrame(rowRdd,schema). registerTempTable("Employee") 

    sqlContext.sql("select * from Employee").show() 


}

但是，當在InteliJ中執行時，我發現類型不匹配錯誤如下。無法確定爲什麼這個錯誤正在提交我只是將字符串轉換爲整數。員工文件具有以下輸入，它們全部顯示在一行中，但它們各自爲一行。

1201，薩蒂什南比亞，25 1202，克里希納，28 1203，amith，39 1204，JAVED，23 1205，prudvi，23

16/06/19 15:18:58 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) 
scala.MatchError: 1201 (of class java.lang.Integer) 
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:295) 
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:294) 
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)

來源

2016-06-19 Priyaranjan Swain

你爲什麼要拆分'employeeSchemaString.split（「」）'和'，'如果字符串用空格'「 ID名年齡」'分開嗎？ –

架構與所有列類型創建是定義爲StringType。

val schema = StructType(employeeSchemaString.split(",").map(colNmae => StructField(colNmae,StringType,true)))

但rowRDD有int，string和int類型的列。

這裏是工作的代碼

val structType= { 
    val id = StructField("id", IntegerType) 
    val name = StructField("name", StringType) 
    val age = StructField("age", IntegerType) 
    new StructType(Array(id, name , age)) 
} 

val rowRdd= employeeRdd.map(row => row.split(",")).map(row => Row(row(0).trim().toInt,row(1),row(2).trim().toInt)) 

sqlContext.createDataFrame(rowRdd,structType). registerTempTable("Employee") 

sqlContext.sql("select * from Employee").show()

來源

2016-06-19 11:31:25 Dazzler

在創建數據框時面對「scala.MatchError：1201（類java.lang.Integer）」

回答

相關問題