2015-12-04 44 views
1

對不起,如果這個問題已經ensered,但我無法找到我需要的...聚合函數(處理NA的)

這是我的假設數據庫:

x1=c("A", "A", "B", "C", "C", "B") 
x2=c("L1", "L1", "L1", "L1", "L2", "L1") 
x3=c("a", "a", "NA", "b", "j","NA") 
x4=c(17, 17, 13.2, NA, 3, 13.2) 
x5=c(1,24,5,7,6,8) 
db=as.data.frame(cbind(x1, x2, x3, x4, x5)) 

我嘗試了很多不同的東西,但是這是basicaly想法

dbF=aggregate(db$x5,by=list(db$x1, db$x2, db$x3,db$x4),FUN=sum) 

預期的輸出是這樣的:

x1e=c("A", "B", "C", "C") 
x2e=c("L1", "L1", "L1", "L2") 
x3e=c("a", "NA", "b", "j")     
x4e=c(17, 13.2, NA, 3) 
x5e=c(25,13,7,6) 
dbExpected=as.data.frame(cbind(x1e, x2e, x3e, x4e, x5e)) 

我真的需要保持NA的最終輸出....任何建議? thx

+0

你真的想組一個'NA'?那有什麼是什麼意思?即什麼是你不能命名的組? – jogo

+0

x1和x2的每個組合是一個不同的實體,x3和x4是與這些實體相關的信息,但有些缺失(NA)。但是,我需要每個x1- x2是輸出中的一次結合,x5是我想由這些實體求和的變量,我不確定它是否清楚...:s –

+0

,但是您在聚合中將所有4個變量分組)' – jogo

回答

1

您可以使用dplyr,並且您的某些功能是多餘的。

# install.packages('dplyr') # only run if not installed 
library(dplyr) 

x1=c("A", "A", "B", "C", "C", "B") 
x2=c("L1", "L1", "L1", "L1", "L2", "L1") 
x3=c("a", "a", "NA", "b", "j","NA") 
x4=c(17, 17, 13.2, NA, 3, 13.2) 
x5=c(1,24,5,7,6,8) 
db=data.frame(x1, x2, x3, x4, x5) 

db %>% 
    group_by(x1, x2, x3, x4) %>% 
    dplyr::summarise(x5e = sum(x5)) 

Source: local data frame [4 x 5] 
Groups: x1, x2, x3 [?] 

     x1  x2  x3 x4 x5e 
    (fctr) (fctr) (fctr) (dbl) (dbl) 
1  A  L1  a 17.0 25 
2  B  L1  NA 13.2 13 
3  C  L1  b NA  7 
4  C  L2  j 3.0  6 
+0

工程太棒了!謝謝 –

3

夫婦的事情:當你做你的data.frame像(cbind然後強迫)你正在字符的中間矩陣,所以當你強迫一個data.frame一切都是因子(未通緝因爲x5應該是數字的顯而易見的原因)。此外,還要確保X4變量具有NA水平(這裏使用addNA,所以當你通過它聚集,你得到你想要的東西。

x1=c("A", "A", "B", "C", "C", "B") 
x2=c("L1", "L1", "L1", "L1", "L2", "L1") 
x3=c("a", "a", "NA", "b", "j","NA") 
x4=addNA(factor(c(17, 17, 13.2, NA, 3, 13.2))) 
x5=c(1,24,5,7,6,8) 
db=data.frame(x1, x2, x3, x4, x5) 

dbF=aggregate(x5 ~ x1+x2+x3+x4, data=db, FUN=sum, na.action=na.pass) 
dbF 
# x1 x2 x3 x4 x5 
# 1 C L2 j 3 6 
# 2 B L1 NA 13.2 13 
# 3 A L1 a 17 25 
# 4 C L1 b <NA> 7