我有包含多項選擇問題結果的數據框。每個項目有0（未提及）或1（提到）。列被命名爲這樣的：將分詞列與分號分隔的列連接

F1.2_1, F1.2_2, F1.2_3, F1.2_4, F1.2_5, F1.2_99 等

我想串連這些值是這樣的：新列應該是選擇的項目以分號分隔的字符串。因此，如果F1.2_1，F1.2_4和F1.2_5中的某一行有1，則它應該是：1;4;5

dichotome列的最後一個數字是要在字符串中使用的項目代碼。

任何想法如何用R（和data.table）實現這一點？謝謝你的幫助！

編輯：

下面是一個例子DF與期望的結果：

structure(list(F1.2_1 = c(0L, 1L, 0L, 1L), F1.2_2 = c(1L, 0L, 
0L, 1L), F1.2_3 = c(0L, 1L, 0L, 1L), F1.2_4 = c(0L, 1L, 0L, 0L 
), F1.2_5 = c(0L, 0L, 0L, 0L), F1.2_99 = c(0L, 0L, 1L, 0L), desired_result = structure(c(3L, 
2L, 4L, 1L), .Label = c("1;2;3", "1;3;4", "2", "99"), class = "factor")), .Names = c("F1.2_1", 
"F1.2_2", "F1.2_3", "F1.2_4", "F1.2_5", "F1.2_99", "desired_result" 
), class = "data.frame", row.names = c(NA, -4L)) 




    F1.2_1 F1.2_2 F1.2_3 F1.2_4 F1.2_5 F1.2_99 desired_result 
1  0  1  0  0  0  0    2 
2  1  0  1  1  0  0   1;3;4 
3  0  0  0  0  0  1    99 
4  1  1  1  0  0  0   1;2;3

來源

2017-04-04 Mario

在他comment中，OP問如何應對更多的選擇題。

下面的方法將能夠處理每個問題的任意數量的問題和選擇。它使用data.table包中的melt()和dcast()。

取樣輸入數據

假設輸入data.frame DT的擴展情況下包含兩個問題，一個有6個選擇和其他4個選項：

DT 
# F1.2_1 F1.2_2 F1.2_3 F1.2_4 F1.2_5 F1.2_99 F2.7_1 F2.7_2 F2.7_3 F2.7_11 
#1:  0  1  0  0  0  0  0  1  1  0 
#2:  1  0  1  1  0  0  1  1  1  1 
#3:  0  0  0  0  0  1  1  0  1  0 
#4:  1  1  1  0  0  0  1  0  1  1

代碼

library(data.table) 

# coerce to data.table and add row number for later join 
setDT(DT)[, rn := .I] 

# reshape from wide to long format 
molten <- melt(DT, id.vars = "rn") 

# alternatively, the measure cols can be specified (in case of other id vars) 
# molten <- melt(DT, measure.vars = patterns("^F")) 

# split question id and choice id 
molten[, c("question_id", "choice_id") := tstrsplit(variable, "_")] 

# reshape only selected choices from long to wide format, 
# thereby pasting together the ids of the selected choices for each question 
result <- dcast(molten[value == 1], rn ~ question_id, paste, collapse = ";", 
       fill = NA, value.var = "choice_id") 

# final join for demonstration only, remove row number as no longer needed 
DT[result, on = "rn"][, rn := NULL][] 
# F1.2_1 F1.2_2 F1.2_3 F1.2_4 F1.2_5 F1.2_99 F2.7_1 F2.7_2 F2.7_3 F2.7_11 F1.2  F2.7 
#1:  0  1  0  0  0  0  0  1  1  0  2  2;3 
#2:  1  0  1  1  0  0  1  1  1  1 1;3;4 1;2;3;11 
#3:  0  0  0  0  0  1  1  0  1  0 99  1;3 
#4:  1  1  1  0  0  0  1  0  1  1 1;2;3 1;3;11

對於每個問題，最終結果顯示每行中選擇了哪些選擇。

重現數據

樣本數據可以與

DT <- structure(list(F1.2_1 = c(0L, 1L, 0L, 1L), F1.2_2 = c(1L, 0L, 
0L, 1L), F1.2_3 = c(0L, 1L, 0L, 1L), F1.2_4 = c(0L, 1L, 0L, 0L 
), F1.2_5 = c(0L, 0L, 0L, 0L), F1.2_99 = c(0L, 0L, 1L, 0L), F2.7_1 = c(0L, 
1L, 1L, 1L), F2.7_2 = c(1L, 1L, 0L, 0L), F2.7_3 = c(1L, 1L, 1L, 
1L), F2.7_11 = c(0L, 1L, 0L, 1L)), .Names = c("F1.2_1", "F1.2_2", 
"F1.2_3", "F1.2_4", "F1.2_5", "F1.2_99", "F2.7_1", "F2.7_2", 
"F2.7_3", "F2.7_11"), row.names = c(NA, -4L), class = "data.frame")

來源

2017-04-04 17:03:05 Uwe

哦，是的，非常好！ – Mario

我們可以嘗試

j1 <- do.call(paste, c(as.integer(sub(".*_", "", 
       names(DF)[-7]))[col(DF[-7])]*DF[-7], sep=";")) 

DF$newCol <- gsub("^;+|;+$", "", gsub(";*0;|0$|^0", ";", j1)) 
DF$newCol 
#[1] "2"  "1;3;4" "99" "1;2;3"

來源

2017-04-04 12:00:34 akrun

非常感謝創建，這與小例子效果很好 - 但是，在我的真實數據，有更多的列和不同的選擇題的問題。例如，另外一個是「F2.7_1」，「F2.7_2」，「F2.7_3」，「F2.7_4」......還有一些是其他問題。任何想法如何可以納入？ – Mario

@Mario在我的第一種方法中，我使用了列號，但是當您顯示它是所需列名的最後一部分時，我對其進行了更改。不知道你想如何納入 – akrun

我在'data.table'中想到了這樣的事情：'newCol = ifelse（F1.2_1 == 1,1，ifelse（F1.2_2,2，ifelse（F1.2_3），3，ifelse（F1.2_4,4，ifelse（F1.2_5,5，ifelse（F1.2_6,6，ifelse（F1.2_7,7，ifelse（F1.2_8,8，ifelse（F1.2_9,9 ，ifelse（F1.2_99,99，NA））））））））））' - 所以它是硬編碼的，但是這個代碼只會產生一個字符而不是一個; - 分離的。這可能以某種方式粘貼在一起 – Mario

將分詞列與分號分隔的列連接

回答

取樣輸入數據

代碼

重現數據

相關問題