Spark MLib決策樹：按特徵標籤的可能性？

我可以設法顯示我的決策樹後顯示我labels的總概率，例如，我有一個表：Spark MLib決策樹：按特徵標籤的可能性？

Total Predictions : 
    65% impressions 
    30% clicks 
    5% conversions

但我的問題是features發現概率（或計數）（按節點），例如：

if feature1 > 5 
    if feature2 < 10 
     Predict Impressions 
     samples : 30 Impressions 
    else feature2 >= 10 
     Predict Clicks 
     samples : 5 Clicks

Scikit自動完成它，我試圖找到一種方法做它Spark

來源

2016-05-10 RoyaumeIX

你可以使用Scala嗎？ –

@DanieldePaula，沒關係。 – RoyaumeIX

我對Scala有一個想法。當我有一段時間時，我會與你分享 –

注意：以下解決方案僅適用於Scala。我沒有找到一種方法來在Python中完成它。

假設你只是想樹的可視化表示在你的榜樣，也許一個選項，以適應目前在星火的GitHub上Node.scala代碼的方法subtreeToString包括概率在每個節點拆分，如以下片段：

def subtreeToString(rootNode: Node, indentFactor: Int = 0): String = { 
    def splitToString(split: Split, left: Boolean): String = { 
    split.featureType match { 
     case Continuous => if (left) { 
     s"(feature ${split.feature} <= ${split.threshold})" 
     } else { 
     s"(feature ${split.feature} > ${split.threshold})" 
     } 
     case Categorical => if (left) { 
     s"(feature ${split.feature} in ${split.categories.mkString("{", ",", "}")})" 
     } else { 
     s"(feature ${split.feature} not in ${split.categories.mkString("{", ",", "}")})" 
     } 
    } 
    } 
    val prefix: String = " " * indentFactor 
    if (rootNode.isLeaf) { 
    prefix + s"Predict: ${rootNode.predict.predict} \n" 
    } else { 
    val prob = rootNode.predict.prob*100D 
    prefix + s"If ${splitToString(rootNode.split.get, left = true)} " + f"(Prob: $prob%04.2f %%)" + "\n" + 
     subtreeToString(rootNode.leftNode.get, indentFactor + 1) + 
     prefix + s"Else ${splitToString(rootNode.split.get, left = false)} " + f"(Prob: ${100-prob}%04.2f %%)" + "\n" + 
     subtreeToString(rootNode.rightNode.get, indentFactor + 1) 
    } 
}

我測試了我的Iris dataset運行模式，我已經得到了以下結果：

scala> println(subtreeToString(model.topNode)) 

If (feature 2 <= -0.762712) (Prob: 35.35 %) 
Predict: 1.0 
Else (feature 2 > -0.762712) (Prob: 64.65 %) 
If (feature 3 <= 0.333333) (Prob: 52.24 %) 
    If (feature 0 <= -0.666667) (Prob: 92.11 %) 
    Predict: 3.0 
    Else (feature 0 > -0.666667) (Prob: 7.89 %) 
    If (feature 2 <= 0.322034) (Prob: 94.59 %) 
    Predict: 2.0 
    Else (feature 2 > 0.322034) (Prob: 5.41 %) 
    If (feature 3 <= 0.166667) (Prob: 50.00 %) 
    Predict: 3.0 
    Else (feature 3 > 0.166667) (Prob: 50.00 %) 
    Predict: 2.0 
Else (feature 3 > 0.333333) (Prob: 47.76 %) 
    Predict: 3.0

一個類似的應用程序roach可用於創建具有此信息的樹結構。主要區別是將打印的信息（split.feature,split.threshold,predict.prob等）存儲爲val並使用它們來構建結構。

來源

2016-05-18 22:24:44

這正是我想要的！ – RoyaumeIX

嘿，夥計們，你們中的任何一個都有使用DataFrame API的解決方案嗎？看到這個問題的一些代表！：http://stackoverflow.com/questions/40558567/how-to-view-random-forest-statistics-in-spark-scala – rtcode

Spark MLib決策樹：按特徵標籤的可能性？

回答

相關問題