的R - 計數在多個列指標（如在Excel SUMPRODUCT）

我有R中的以下數據幀：的R - 計數在多個列指標（如在Excel SUMPRODUCT）

df <- data.frame(id=c('a','b','a','c','b','a'), 
       indicator1=c(1,0,0,0,1,1), 
       indicator2=c(0,0,0,1,0,1), 
       extra1=c(4,5,12,4,3,7), 
       extra2=c('z','z','x','y','x','x')) 

id indicator1 indicator2 extra1 extra2 
a   1   0  4  z 
b   0   0  5  z 
a   0   0  12  x 
c   0   1  4  y 
b   1   0  3  x 
a   1   1  7  x

我想與計數超過的數目的所有行添加新列這個特定的id出現的各種指標等於1.例如：

id indicator1 indicator2 extra1 extra2 countInd1 countInd2 countInd1Ind2 
a   1   0  4  z  2   1   1 
b   0   0  5  z  1   0   0 
a   0   0  12  x  2   1   1 
c   0   1  4  y  0   1   0 
b   1   0  3  x  1   0   0 
a   1   1  7  x  2   1   1

我該怎麼做？

來源

2013-08-20 user2700691

有幾種方法。這裏有一個與ave和within：

within(df, { 
    ind1ind2 <- ave(as.character(interaction(indicator1, indicator2, drop=TRUE)), 
        id, FUN = function(x) sum(x == "1.1")) 
    ind2 <- ave(indicator2, id, FUN = function(x) sum(x == 1)) 
    ind1 <- ave(indicator1, id, FUN = function(x) sum(x == 1)) 
}) 
# id indicator1 indicator2 extra1 extra2 ind1 ind2 ind1ind2 
# 1 a   1   0  4  z 2 1  1 
# 2 b   0   0  5  z 1 0  0 
# 3 a   0   0  12  x 2 1  1 
# 4 c   0   1  4  y 0 1  0 
# 5 b   1   0  3  x 1 0  0 
# 6 a   1   1  7  x 2 1  1

這裏有一個選擇：

A <- setNames(aggregate(cbind(indicator1, indicator2) ~ id, df, 
         function(x) sum(x == 1)), c("id", "ind1", "ind2")) 
B <- setNames(aggregate(interaction(indicator1, indicator2, drop = TRUE) ~ id, 
         df, function(x) sum(x == "1.1")), c("id", "ind1ind2")) 
Reduce(function(x, y) merge(x, y), list(df, A, B)) 
# id indicator1 indicator2 extra1 extra2 ind1 ind2 ind1ind2 
# 1 a   1   0  4  z 2 1  1 
# 2 a   0   0  12  x 2 1  1 
# 3 a   1   1  7  x 2 1  1 
# 4 b   0   0  5  z 1 0  0 
# 5 b   1   0  3  x 1 0  0 
# 6 c   0   1  4  y 0 1  0

當然，如果你的數據是大的，你會想探索「data.table」包。與within版本相比，它的打字也少一些。

library(data.table) 
DT <- data.table(df) 
DT[, c("ind1", "ind2", "ind1ind2") := 
    list(sum(indicator1 == 1), 
      sum(indicator2 == 1), 
      sum(interaction(indicator1, indicator2, 
          drop = TRUE) == "1.1")), 
    by = "id"] 
DT 
# id indicator1 indicator2 extra1 extra2 ind1 ind2 ind1ind2 
# 1: a   1   0  4  z 2 1  1 
# 2: b   0   0  5  z 1 0  0 
# 3: a   0   0  12  x 2 1  1 
# 4: c   0   1  4  y 0 1  0 
# 5: b   1   0  3  x 1 0  0 
# 6: a   1   1  7  x 2 1  1

，取而代之的sum(interaction(...) == "1.1")，你也可以做sum(indicator1 == 1 & indicator2 == 1)如果你覺得這是更明確。我沒有進行基準測試，看看哪個更有效。 interaction正是我第一次想到的。

來源

2013-08-20 17:07:06 A5C1D2H2I1M1N2O1R2T1

如果'indicator1'和'indicator2'是（因爲他們似乎，似乎被命名）的指標，'DT [ ，c（'ind1'，'ind2'，'ind1ind2'）：= list（sum（indicator1），sum（indicator2），sum（（indicator1 + indicator2）> 1）），by = id]'should work（and效率更高） – mnel

@mnel，這是我想到的，但我並不想對「indicator1」和「indicator2」列中是否有其他值做任何假設。 – A5C1D2H2I1M1N2O1R2T1

或者你可以這樣做：

get_freq1 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator1)} 
get_freq2 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator2)} 

df = data.frame(df, countInd1 = sapply(1:nrow(df), get_freq1), countInd2 = sapply(1:nrow(df), get_freq2)) 
df= data.frame(df, countInd1Ind2 = ((df$countInd1 != 0) & (df$countInd2 != 0))*1)

你得到：

# id indicator1 indicator2 extra1 extra2 countInd1 countInd2 countInd1Ind2 
#1 a   1   0  4  z   2   1    1 
#2 b   0   0  5  z   1   0    0 
#3 a   0   0  12  x   2   1    1 
#4 c   0   1  4  y   0   1    0 
#5 b   1   0  3  x   1   0    0 
#6 a   1   1  7  x   2   1    1

來源

2013-08-20 17:21:24 Mayou

雖然我認爲Ananda Mahto的代碼非常整潔！ – Mayou

的R - 計數在多個列指標（如在Excel SUMPRODUCT）

回答

相關問題