如何統計樹中每個節點的觀測值

我目前正在處理MMST包中的wine數據。我已經整個數據集分成訓練和測試，並建立類似下面的代碼樹：如何統計樹中每個節點的觀測值

library("rpart") 
library("gbm") 
library("randomForest") 
library("MMST") 

data(wine) 
aux <- c(1:178) 
train_indis <- sample(aux, 142, replace = FALSE) 
test_indis <- setdiff(aux, train_indis) 

train <- wine[train_indis,] 
test <- wine[test_indis,] #### divide the dataset into trainning and testing 

model.control <- rpart.control(minsplit = 5, xval = 10, cp = 0) 
fit_wine <- rpart(class ~ MalicAcid + Ash + AlcAsh + Mg + Phenols + Proa + Color + Hue + OD + Proline, data = train, method = "class", control = model.control) 

windows() 
plot(fit_wine,branch = 0.5, uniform = T, compress = T, main = "Full Tree: without pruning") 
text(fit_wine, use.n = T, all = T, cex = .6)

而且我可以得到這樣一個形象： Tree without pruning

每個節點下什麼數（ Grignolino下的示例0/1/48）是什麼意思？如果我想知道每個節點有多少訓練和測試樣本，我應該在代碼中寫些什麼？

來源

2012-12-03 Rick Kim

這些數字表示該節點中每個類的成員數。因此，標籤「0/1/48」告訴我們，類別1（Barabera，我推斷）有0例，類別2（Barolo）僅有一例，而類別3（Grignolino）有48例。

您可以使用summary(fit_wine)獲取關於樹和每個節點的詳細信息。
請參閱?summary.rpart瞭解更多詳情。

您可以額外使用predict()（將調用predict.rpart()）來查看樹如何對數據集進行分類。例如，predict(fit_wine, train, type="class")。或者將其包裝在表格中以方便查看table(predict(fit_wine, train, type = "class"),train[,"class"])

如果您特別想知道某個觀察所在的葉節點，則該信息將存儲在fit_wine$where中。對於數據集中的每個案例，fit_wine$where包含表示案例所在的葉節點的行號fit_wine$frame。因此，我們可以得到葉信息分別與：

trainingnodes <- rownames(fit_wine$frame)[fit_wine$where]

爲了獲取測試數據的葉子信息，我曾經與type="matrix"運行predict()並推斷它。這會令人困惑地返回一個通過連接預測類，類在連接樹中的節點和類概率產生的矩陣。因此，對於這個例子：

testresults <- predict(fit_wine, test, type = "matrix") 
testresults <- data.frame(testresults) 
names(testresults) <- c("ClassGuess","NofClass1onNode", "NofClass2onNode", 
    "NofClass3onNode", "PClass1", "PClass2", "PClass2")

由此，我們可以推斷出不同的節點，例如，從unique(testresults[,2:4]），但是這是不好的。但是，Yuji has a clever hack for this at a previous question。他複製軟件rpart對象並替換爲類節點，因此運行預測收益不是節點類：

nodes_wine <- fit_wine 
nodes_wine$frame$yval = as.numeric(rownames(nodes_wine$frame)) 
testnodes <- predict(nodes_wine, test, type="vector")

我在這裏包括瞭解決方案，但人們go should upvote him。

來源

2012-12-03 21:35:21 MattBagg

感謝您的回答，我嘗試了'predict（）'方法，其結果是Barabera，Barolo和Grignolino等一系列類別，有沒有辦法查看它們最終落入哪個節點，因爲有幾個節點代表相同的類別。 –

我這樣運行：result.'test_pred < - predict（fit_wine，test，type =「class」） test_pred'並且它會返回一系列類別 –

是的，'type =「class」'更好，但對於我們想要什麼，'type =「matrix」'似乎更有幫助。 – MattBagg

如何統計樹中每個節點的觀測值

回答

相關問題