2015-02-10 25 views
0

我使用了從this link CHAID包。它給了我一個chaid對象,可以被繪製..我想要一個決策表,每個決策規則在一列而不是決策樹。 。但我不明白如何訪問節點和路徑在這個CHAID object..Kindly幫助我.. 我跟着this linkchaid迴歸樹到表中轉換

給我不能在這裏發佈我的數據,因爲它太long.So我張貼的程序一個代碼,它採用chaid提供的樣本數據集來執行任務。

從CHAID的幫助手冊複製:

library("CHAID") 

    ### fit tree to subsample 
    set.seed(290875) 
    USvoteS <- USvote[sample(1:nrow(USvote), 1000),] 

    ctrl <- chaid_control(minsplit = 200, minprob = 0.1) 
    chaidUS <- chaid(vote3 ~ ., data = USvoteS, control = ctrl) 

    print(chaidUS) 
    plot(chaidUS) 

輸出:

Model formula: 
vote3 ~ gender + ager + empstat + educr + marstat 

Fitted party: 
[1] root 
| [2] marstat in married 
| | [3] educr <HS, HS, >HS: Gore (n = 311, err = 49.5%) 
| | [4] educr in College, Post Coll: Bush (n = 249, err = 35.3%) 
| [5] marstat in widowed, divorced, never married 
| | [6] gender in male: Gore (n = 159, err = 47.8%) 
| | [7] gender in female 
| | | [8] ager in 18-24, 25-34, 35-44, 45-54: Gore (n = 127, err = 22.0%) 
| | | [9] ager in 55-64, 65+: Gore (n = 115, err = 40.9%) 

Number of inner nodes: 4 
Number of terminal nodes: 5 

所以我的問題是如何讓每個決策規則決策表這棵樹的數據(支/路徑)在一列..我不明白如何從這個chaid對象訪問不同的樹形路徑..

+0

請向我們提供一個[重複的例子(HTTP:// stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)。 – 2015-02-10 09:53:09

回答

0

CHAID包使用partykit(遞歸分區)樹結構。您可以使用聚會節點來行走樹 - 節點可以是終端,也可以有包含決策規則(拆分)和擬合數據信息的節點列表。

下面的代碼遍歷樹並創建決策表。它僅用於演示目的而編寫,僅在一個樣本樹上進行測試。與CHAID樹對象

tree2table <- function(party_tree) { 

    df_list <- list() 
    var_names <- attr(party_tree$terms, "term.labels") 
    var_levels <- lapply(party_tree$data, levels) 

    walk_the_tree <- function(node, rule_branch = NULL) { 
    # depth-first walk on partynode structure (recursive function) 
    # decision rules are extracted for every branch 
    if(missing(rule_branch)) { 
     rule_branch <- setNames(data.frame(t(replicate(length(var_names), NA))), var_names) 
     rule_branch <- cbind(rule_branch, nodeId = NA) 
     rule_branch <- cbind(rule_branch, predict = NA) 
    } 
    if(is.terminal(node)) { 
     rule_branch[["nodeId"]] <- node$id 
     rule_branch[["predict"]] <- predict_party(party_tree, node$id) 
     df_list[[as.character(node$id)]] <<- rule_branch 
    } else { 
     for(i in 1:length(node)) { 
     rule_branch1 <- rule_branch 
     val1 <- decision_rule(node,i) 
     rule_branch1[[names(val1)[1]]] <- val1 
     walk_the_tree(node[i], rule_branch1) 
     } 
    } 
    } 

    decision_rule <- function(node, i) { 
    # returns split decision rule in data.frame with variable name an values 
    var_name <- var_names[node$split$varid[[1]]] 
    values_vec <- var_levels[[var_name]][ node$split$index == i] 
    values_txt <- paste(values_vec, collapse = ", ") 
    return(setNames(values_txt, var_name)) 
    } 
    # compile data frame list 
    walk_the_tree(party_tree$node) 
    # merge all dataframes 
    res_table <- Reduce(rbind, df_list) 
    return(res_table) 
} 

通話功能:

table1 <- tree2table(chaidUS) 

的結果應該是這樣的:

gender ager      empstat educr    marstat       nodeId predict 
-------- -------------------------- --------- ------------------ -------------------------------- -------- --------- 
NA  NA       NA  <HS, HS, >HS  married       3  Gore  
NA  NA       NA  College, Post Coll married       4  Bush  
male  NA       NA  NA     widowed, divorced, never married 6  Gore  
female 18-24, 25-34, 35-44, 45-54 NA  NA     widowed, divorced, never married 8  Gore  
female 55-64, 65+     NA  NA     widowed, divorced, never married 9  Gore