我正在通過以下網站閱讀決策樹分類部分。 http://spark.apache.org/docs/latest/mllib-decision-tree.html帶星火的Spark決策樹
我建立了提供示例代碼到我的筆記本電腦,並試圖瞭解它的輸出。 但我無法理解一點。以下是代碼和 sample_libsvm_data.txt可以在下面找到https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt
請參考輸出,並讓我知道我的意見是否正確。這是我的意見。
- 測試錯誤意味着它有大約95%的基於訓練的校正 數據?
(最好奇的一個)如果功能434大於0.0那麼,它將基於吉尼的雜質爲1?例如,該值給出434:178那麼這將是1
from __future__ import print_function from pyspark import SparkContext from pyspark.mllib.tree import DecisionTree, DecisionTreeModel from pyspark.mllib.util import MLUtils if __name__ == "__main__": sc = SparkContext(appName="PythonDecisionTreeClassificationExample") data = MLUtils.loadLibSVMFile(sc,'/home/spark/bin/sample_libsvm_data.txt') (trainingData, testData) = data.randomSplit([0.7, 0.3]) model = DecisionTree.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo={}, impurity='gini', maxDepth=5, maxBins=32) predictions = model.predict(testData.map(lambda x: x.features)) labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions) testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count()/float(testData.count()) print('Test Error = ' + str(testErr)) print('Learned classification tree model:') print(model.toDebugString()) // =====Below is my output===== Test Error = 0.0454545454545 Learned classification tree model: DecisionTreeModel classifier of depth 1 with 3 nodes If (feature 434 <= 0.0) Predict: 0.0 Else (feature 434 > 0.0) Predict: 1.0