2017-04-20 68 views
2

我使用樸素貝葉斯算法進行分類文章獲取價值,並希望訪問的部分結果的「概率」列:org.apache.spark.sql.AnalysisException:無法從概率

val Array(trainingDF, testDF) = rawDataDF.randomSplit(Array(0.6, 0.4)) 
    val ppline = MyUtil.createTrainPpline(rawDataDF) 
    val model = ppline.fit(trainingDF) 
    val testRes = model.transform(testDF) 
    testRes.filter($"probability"(0).as[Double] === 1).show() 

詮釋的最後一行,打破

Exception in thread "main" org.apache.spark.sql.AnalysisException: Can't extract value from probability#133; 
      at org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73) 
      at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:616) 
      at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:608) 
      at 

回答

1

你總是可以得到的基本RDD和篩選:

val filteredRes = results.rdd.filter(row => row.getAs[Vector]("probability")(0) == 1) 

然後你可以將其轉換回dataframe如果您需要:

val df = spark.createDataFrame(filteredRes, results.schema)