將多項式轉換爲二項式 - 數千列

-1

我有一個包含100列（名稱爲Col_1，Col_2 ... Col_100）的數據集，其結果如下：「A」，「B」，「C」...我不知道所有數據集中都有很多不同的字符。我想每個值轉換爲一個列有這樣一個矩陣：將多項式轉換爲二項式 - 數千列

A B C D 
0 1 0 1 
1 1 0 1

我這個嘗試：

library(reshape2) 
train <- read.csv("train.csv",head=TRUE,sep=",") 
train 

recast(train, id ~ value, id.var = 1, fun.aggregate = function(x) (length(x) > 0) + 0L)

但我發現了以下錯誤：

Error in eval(substitute(expr), envir, enclos) : 
    n must be a positive integer 
In addition: Warning messages: 
1: attributes are not identical across measure variables; they will be dropped 
2: In split_indices(.group, .n) : 
    NAs introduced by coercion to integer range

我能做些什麼來返回我想要的表格？

來源

2016-11-16 John_Rodgers

也許這是你在找什麼。第一步收集可能的值。第二步使每個變量都知道潛在的值。這允許table在缺少特定值時產生0個計數，以便rbind將構建適當的輸出。

# collect all possible values 
allLevels <- levels(unlist(sapply(df, unique))) 
# provide all levels to each variable in the data.frame 
dfNew <- data.frame(lapply(df, function(i) factor(i, levels=allLevels))) 

# produce the count for each variable 
do.call(rbind, lapply(dfNew, table)) 
    a b c d e g i j 
x 3 2 8 2 0 0 0 0 
y 0 0 2 4 4 1 3 1

數據

set.seed(1234) 
df <- data.frame(x=sample(letters[1:4], 15, replace=TRUE), 
       y=sample(letters[3:10], 15, replace=TRUE))

來源

2016-11-16 13:26:46 lmo

@Imo感謝您的答覆。我將所有值都設爲N/A，是正常的嗎？ –

用我提供的例子，或者你的原始數據集？如果使用原始數據集，那麼至少應該提供'str（df）'的前10行，其中df是data.frame的名稱。 – lmo

我該怎麼做？對不起，我在R –

將多項式轉換爲二項式 - 數千列

回答

相關問題