將「選擇所有適用的選項」轉換爲二進制選項

我有調查答覆的數據框，其中一些列是參與者可以選擇多個答案的問題（「選擇所有適用的答案」）。將「選擇所有適用的選項」轉換爲二進制選項

> age <- c(24, 28, 44, 55, 53) 
> ethnicity <- c("ngoni", "bemba", "lozi tonga", "bemba tonga other", "bemba tongi") 
> ethnicity_other <- c(NA, NA, "luvale", NA, NA) 
> df <- data.frame(age, ethnicity, ethnicity_other)

我想被設置爲二進制響應項這些問題，使得每個的響應選擇（在這種情況下ethnicity和ethnicity_other）變得與列向量0或者爲1。

到目前爲止，我寫分開單獨的唯一反應腳本到一個列表（z）：

> x <- unique(as.vector(unlist(strsplit(as.character(df$ethnicity_other), " ")), mode="list")) 
> y <- unique(as.vector(unlist(strsplit(as.character(df$ethnicity), " ")), mode="list")) 
> 
> combine <- c(x, y) 
> 
> z <- NULL 
> for(i in combine){ 
> if(!is.na(i)){ 
> z <- append(z, i) 
> } 
> }

我然後從列表中創建新的專欄，並與NA值填滿了。

> for(elm in z){ 
> df[paste0("ethnicity_",elm)] <- NA 
> }

所以現在我有35個，我想，以填補一和零，這取決於該列名（或列名的一部分，因爲我ethnicity_前綴的話）可以在發現附加列在ethnicity或ethnicity_other.下的相應單元格我試圖用一些方法刺穿它，沒有很好的解決方案。

來源

2014-02-19 chrisnyoder

這裏有一個可以使用plyr或data.table來實現。

all_ethnicities <- unique(c(
    unlist(strsplit(df$ethnicity, " ")), 
    unlist(strsplit(df$ethnicity_other, " ")) 
    )) 

df$id <- 1:nrow(df) 

library(plyr) 

ddply(df, .(id), function(x) 
     table(factor(unlist(strsplit(paste(x$ethnicity, x$ethnicity_other), " ")), 
        levels = all_ethnicities))) 

## id ngoni bemba lozi tonga other tongi luvale 
## 1 1  1  0 0  0  0  0  0 
## 2 2  0  1 0  0  0  0  0 
## 3 3  0  0 1  1  0  0  1 
## 4 4  0  1 0  1  1  0  0 
## 5 5  0  1 0  0  0  1  0 

library(data.table) 

DT <- data.table(df) 

DT[, { 
    as.list(
     table(
      factor(
       unlist(strsplit(paste(ethnicity, ethnicity_other), " ")), 
       levels = all_ethnicities) 
      ), 
     ) 
}, by = id] 

##  id ngoni bemba lozi tonga other tongi luvale 
## 1: 1  1  0 0  0  0  0  0 
## 2: 2  0  1 0  0  0  0  0 
## 3: 3  0  0 1  1  0  0  1 
## 4: 4  0  1 0  1  1  0  0 
## 5: 5  0  1 0  0  0  1  0

來源

2014-02-20 00:11:25

哇，這太棒了。非常感謝。我有點不清楚ddply函數是如何工作的（函數（x）...？），但我會稍微修改一下。我也試着讓每列都以「ethnicity_」爲前綴。在我的嘗試中，我在創建列名時使用了粘貼功能，但我很難理解在第一個解釋中列創建過程發生的位置。再次感謝！！ – chrisnyoder

@chrisnyoder'ddply'通過'id'變量（在這種情況下，只是每一行）分割數據，然後將該函數應用於每一條數據。所以函數'x'的輸入將是一行'data.frame'。試試'ddply（df，。（id），function（x）browser（））'來探索函數的環境。爲了設置列名，最簡單的解決方案是在運行後執行此操作（例如，'out < - ddply（df，...）'then'names（out）[names（out）！=「id」] < paste0（「ethnicity_」，names（out）[names（out）！=「id」]）'。我會在今天晚些時候添加更多這個答案 –

這裏是我會怎麼做：

首先，你需要一些東西來存儲每個參與者的種族。我的方式做到這一點是建立這些列表：

ethnicities = sapply(X=df$ethnicity, FUN=function(response) {return (strsplit(as.character(response), " "))})

爲了您的具體的例子，我們將有：

> ethnicities 
[[1]] 
[1] "ngoni" 

[[2]] 
[1] "bemba" 

[[3]] 
[1] "lozi" "tonga" 

[[4]] 
[1] "bemba" "tonga" "other" 

[[5]] 
[1] "bemba" "tongi"

，然後遍歷這些來填補你的data.frame DF：

for (i in seq_along(ethnicities)) { 
    for (eth in ethnicities[[i]]) { 
    df[[paste0('ethnicity_',eth)]][i]=1 
    } 
}

爲DF將所得值應爲：

> df 
    age   ethnicity ethnicity_other ethnicity_luvale ethnicity_ngoni ethnicity_bemba 
1 24    ngoni    NA    NA    1    NA 
2 28    bemba    NA    NA    NA    1 
3 44  lozi tonga    NA    NA    NA    NA 
4 55 bemba tonga other    1    NA    NA    1 
5 53  bemba tongi    NA    NA    NA    1 
    ethnicity_lozi ethnicity_tonga ethnicity_tongi 
1    NA    NA    NA 
2    NA    NA    NA 
3    1    1    NA 
4    NA    1    NA 
5    NA    NA    1

還有其他方法可以做到這一點。你也可以將這兩個打包成，但我感覺得到的代碼不會更高效（但是閱讀起來會更復雜！）。

這有幫助嗎？

編輯：

順便說一句，如果你真的想0，而不是NA在您的data.frame，它是改變你的代碼初始化添加的列一樣簡單：

> for(elm in z){ 
> df[paste0("ethnicity_",elm)] <- 0 # instead of NA 
> }

來源

2014-02-19 23:42:08 Jealie

下面是使用concat.split.expanded從我的「splitstackshape」包的方法：

## Combine your "ethnicity" and "ethnicity_other" column 
df$ethnicity <- paste(df$ethnicity, 
         ifelse(is.na(df$ethnicity_other), "", 
          as.character(df$ethnicity_other))) 
## Drop the original "ethnicity_other" column 
df$ethnicity_other <- NULL 

## Split up the new "ethnicity" column 
library(splitstackshape) 
concat.split.expanded(df, "ethnicity", sep=" ", 
         type="character", fill=0, drop=TRUE) 
# age ethnicity_bemba ethnicity_lozi ethnicity_luvale ethnicity_ngoni 
# 1 24    0    0    0    1 
# 2 28    1    0    0    0 
# 3 44    0    1    1    0 
# 4 55    1    0    0    0 
# 5 53    1    0    0    0 
# ethnicity_other ethnicity_tonga ethnicity_tongi 
# 1    0    0    0 
# 2    0    0    0 
# 3    0    1    0 
# 4    1    1    0 
# 5    0    0    1

的fill參數可以很容易地設置爲任何你想要的東西。它默認爲NA，但在這裏，我已將它設置爲0，因爲我認爲這就是您要查找的內容。

來源

2014-04-04 16:53:34 A5C1D2H2I1M1N2O1R2T1

將「選擇所有適用的選項」轉換爲二進制選項

回答

相關問題