2016-06-21 259 views
2

我正在使用的Spark 1.6.1一個ParamGridBuilder和2.0線性迴歸scala.MatchError:

val paramGrid = new ParamGridBuilder() 
    .addGrid(lr.regParam, Array(0.1, 0.01)) 
    .addGrid(lr.fitIntercept) 
    .addGrid(lr.elasticNetParam, Array(0.0, 0.5, 1.0)) 
    .build() 

錯誤時scala.MatchError是

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 57.0 failed 1 times, most recent failure: Lost task 0.0 in stage 57.0 (TID 257, localhost): 
scala.MatchError: [280000,1.0,[2400.0,9373.0,3.0,1.0,1.0,0.0,0.0,0.0]] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) 

Full code

的問題是在這種情況下如何使用ParamGridBuilder

回答

3

這裏的問題是輸入模式不是ParamGridBuilder。價格列以整數形式加載,而LinearRegression期望加倍。您可以通過顯式鑄造柱所需類型修復:

val houses = sqlContext.read.format("com.databricks.spark.csv") 
    .option("header", "true") 
    .option("inferSchema", "true") 
    .load(...) 
    .withColumn("price", $"price".cast("double")) 
+0

感謝,錯過了從因爲有他們爲什麼鑄造翻番 – oluies

+0

不客氣無可奉告最初的例子。它應該基於模式進行驗證,不會在作業中發生異常。不幸的是,ML充滿了這樣的故障。 – zero323

+1

似乎工作https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1221303294178191/1275177332049116/6190062569763605/latest.html – oluies