我正在使用內置的Scala 2.10.5的Spark 1.6.1。我正在檢查一些天氣數據,有時候我有十進制值。下面是代碼:DecimalType問題 - 類java.lang.String的scala.MatchError
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import org.apache.spark.sql.Row
import org.apache.spark.rdd.RDD
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.SQLContext
val rawData=sc.textFile("Example_Weather.csv").map(_.split(","))
val header=rawData.first
val rawDataNoHeader=rawData.filter(_(0)!= header(0))
rawDataNoHeader.first
object schema {
val weatherdata= StructType(Seq(
StructField("date", StringType, true),
StructField("Region", StringType, true),
StructField("Temperature", DecimalType(32,16), true),
StructField("Solar", IntegerType, true),
StructField("Rainfall", DecimalType(32,16), true),
StructField("WindSpeed", DecimalType(32,16), true))
)
}
val dataDF=sqlContext.createDataFrame(rawDataNoHeader.map(p=>Row(p(0),p(1),p(2),p(3),p(4),p(5))), schema.weatherdata)
dataDF.registerTempTable("weatherdataSQL")
val datasql = sqlContext.sql("SELECT * FROM weatherdataSQL")
datasql.collect().foreach(println)
當運行的代碼,我得到了預期的模式和sqlContext:
scala> object schema {
| val weatherdata= StructType(Seq(
| StructField("date", StringType, true),
| StructField("Region", StringType, true),
| StructField("Temperature", DecimalType(32,16), true),
| StructField("Solar", IntegerType, true),
| StructField("Rainfall", DecimalType(32,16), true),
| StructField("WindSpeed", DecimalType(32,16), true))
|)
| }
16/09/24 09:40:58 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:56288 in memory (size: 4.6 KB, free: 511.1 MB)
16/09/24 09:40:58 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:39349 in memory (size: 4.6 KB, free: 2.7 GB)
16/09/24 09:40:58 INFO ContextCleaner: Cleaned accumulator 2
16/09/24 09:40:58 INFO BlockManagerInfo: Removed broadcast_1_piece0 on localhost in memory (size: 1964.0 B, free: 511.1 MB)
16/09/24 09:40:58 INFO BlockManagerInfo: Removed broadcast_1_piece0 on localhost:41412 in memory (size: 1964.0 B, free: 2.7 GB)
16/09/24 09:40:58 INFO ContextCleaner: Cleaned accumulator 1
defined module schema
scala> val dataDF=sqlContext.createDataFrame(rawDataNoHeader.map(p=>Row(p(0),p(1),p(2),p(3),p(4),p(5))), schema.weatherdata)
dataDF: org.apache.spark.sql.DataFrame = [date: string, Region: string, Temperature: decimal(32,16), Solar: int, Rainfall: decimal(32,16), WindSpeed: decimal(32,16)]
然而,在最後一行代碼給我下面的:
16/09/24 09:41:03 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, localhost): scala.MatchError: 20.21666667 (of class java.lang.String)
數字20.21666667的確是第一個觀察到的特定地理區域的溫度。我以爲我已經成功地指定溫度爲十進制類型(32,16)。我的代碼或者即使是我所調用的sqlContext有問題嗎?
至於建議,我改變了dataDF到如下:
val dataDF= sqlContext.createDataFrame(rawDataNoHeader.map(p=>Row(p(0),p(1),BigDecimal(p(2)),p(3),BigDecimal(p(4)),BigDecimal(p(5)))), schema.weatherdata)
不幸的是,我現在得到的鑄件問題
16/09/24 10:31:35 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
爲什麼這麼說也許,給自信的回答..詢問有關的問題的詳細信息案例信息是不夠的。這樣你的答案會更有用。 – pamu
感謝您的幫助......然而,現在出現了投射問題......請參閱編輯。 再次感謝您的幫助! –