2013-08-20 71 views
1

我有R中的以下數據幀:的R - 計數在多個列指標(如在Excel SUMPRODUCT)

df <- data.frame(id=c('a','b','a','c','b','a'), 
       indicator1=c(1,0,0,0,1,1), 
       indicator2=c(0,0,0,1,0,1), 
       extra1=c(4,5,12,4,3,7), 
       extra2=c('z','z','x','y','x','x')) 

id indicator1 indicator2 extra1 extra2 
a   1   0  4  z 
b   0   0  5  z 
a   0   0  12  x 
c   0   1  4  y 
b   1   0  3  x 
a   1   1  7  x 

我想與計數超過的數目的所有行添加新列這個特定的id出現的各種指標等於1.例如:

id indicator1 indicator2 extra1 extra2 countInd1 countInd2 countInd1Ind2 
a   1   0  4  z  2   1   1 
b   0   0  5  z  1   0   0 
a   0   0  12  x  2   1   1 
c   0   1  4  y  0   1   0 
b   1   0  3  x  1   0   0 
a   1   1  7  x  2   1   1 

我該怎麼做?

回答

4

有幾種方法。這裏有一個與avewithin

within(df, { 
    ind1ind2 <- ave(as.character(interaction(indicator1, indicator2, drop=TRUE)), 
        id, FUN = function(x) sum(x == "1.1")) 
    ind2 <- ave(indicator2, id, FUN = function(x) sum(x == 1)) 
    ind1 <- ave(indicator1, id, FUN = function(x) sum(x == 1)) 
}) 
# id indicator1 indicator2 extra1 extra2 ind1 ind2 ind1ind2 
# 1 a   1   0  4  z 2 1  1 
# 2 b   0   0  5  z 1 0  0 
# 3 a   0   0  12  x 2 1  1 
# 4 c   0   1  4  y 0 1  0 
# 5 b   1   0  3  x 1 0  0 
# 6 a   1   1  7  x 2 1  1 

這裏有一個選擇:

A <- setNames(aggregate(cbind(indicator1, indicator2) ~ id, df, 
         function(x) sum(x == 1)), c("id", "ind1", "ind2")) 
B <- setNames(aggregate(interaction(indicator1, indicator2, drop = TRUE) ~ id, 
         df, function(x) sum(x == "1.1")), c("id", "ind1ind2")) 
Reduce(function(x, y) merge(x, y), list(df, A, B)) 
# id indicator1 indicator2 extra1 extra2 ind1 ind2 ind1ind2 
# 1 a   1   0  4  z 2 1  1 
# 2 a   0   0  12  x 2 1  1 
# 3 a   1   1  7  x 2 1  1 
# 4 b   0   0  5  z 1 0  0 
# 5 b   1   0  3  x 1 0  0 
# 6 c   0   1  4  y 0 1  0 

當然,如果你的數據是大的,你會想探索 「data.table」 包。與within版本相比,它的打字也少一些。

library(data.table) 
DT <- data.table(df) 
DT[, c("ind1", "ind2", "ind1ind2") := 
    list(sum(indicator1 == 1), 
      sum(indicator2 == 1), 
      sum(interaction(indicator1, indicator2, 
          drop = TRUE) == "1.1")), 
    by = "id"] 
DT 
# id indicator1 indicator2 extra1 extra2 ind1 ind2 ind1ind2 
# 1: a   1   0  4  z 2 1  1 
# 2: b   0   0  5  z 1 0  0 
# 3: a   0   0  12  x 2 1  1 
# 4: c   0   1  4  y 0 1  0 
# 5: b   1   0  3  x 1 0  0 
# 6: a   1   1  7  x 2 1  1 

,取而代之的sum(interaction(...) == "1.1"),你也可以做sum(indicator1 == 1 & indicator2 == 1)如果你覺得這是更明確。我沒有進行基準測試,看看哪個更有效。 interaction正是我第一次想到的。

+0

如果'indicator1'和'indicator2'是(因爲他們似乎,似乎被命名)的指標,'DT [ ,c('ind1','ind2','ind1ind2'):= list(sum(indicator1),sum(indicator2),sum((indicator1 + indicator2)> 1)),by = id]'should work(and效率更高) – mnel

+0

@mnel,這是我想到的,但我並不想對「indicator1」和「indicator2」列中是否有其他值做任何假設。 – A5C1D2H2I1M1N2O1R2T1

0

或者你可以這樣做:

get_freq1 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator1)} 
get_freq2 = function(i) {sum(df[which(df$id == df[i,1]),]$indicator2)} 

df = data.frame(df, countInd1 = sapply(1:nrow(df), get_freq1), countInd2 = sapply(1:nrow(df), get_freq2)) 
df= data.frame(df, countInd1Ind2 = ((df$countInd1 != 0) & (df$countInd2 != 0))*1) 

你得到:

# id indicator1 indicator2 extra1 extra2 countInd1 countInd2 countInd1Ind2 
#1 a   1   0  4  z   2   1    1 
#2 b   0   0  5  z   1   0    0 
#3 a   0   0  12  x   2   1    1 
#4 c   0   1  4  y   0   1    0 
#5 b   1   0  3  x   1   0    0 
#6 a   1   1  7  x   2   1    1 
+0

雖然我認爲Ananda Mahto的代碼非常整潔! – Mayou