2016-05-22 20 views
1

我有以下與特殊缺失值情況數據(vnum1爲vcat1 == 3的所有值丟失):獲取集合函數中的缺失值

> head(mydf) 
     vnum1 vcat1 
1 -0.1624229  1 
2 0.2465567  1 
3   NA  3 
4 0.7067778  2 
5   NA  3 
6 -0.2241726  4 
> dput(mydf) 
structure(list(vnum1 = c(-0.162422853864248, 0.246556718176803, 
NA, 0.706777793886275, NA, -0.224172615208867, 0.0545850414695318, 
NA, NA, -1.94778020954922, 1.89581259201036, 0.901973743223488, 
-0.31255172156186, -1.67311124367419, 0.491316838004494, NA, 
-0.699315343799762, 0.668020448193884, 1.45492995320554, 1.17747976289091, 
-0.65137204397438, 1.78678696473193, 2.58978935829221, NA, 1.26534157843481, 
0.629748102812663, 0.246596558590885, 0.968707124353133, 0.108668693948881, 
-0.219419917000748, 2.25307417017233, -0.626124211646445, -1.16298694223082, 
-1.23524906047676, -2.34636152907898, NA, 0.408667368960836, 
0.272596114054819, 0.747455245383144, -0.745843219461836, -0.0966351379737077, 
1.44803320811527, -1.5434982335725, -0.782902668540696, -0.448286848257394, 
NA, 0.168327130336994, -0.493721325506037, 0.397253883862878, 
1.57070527855864), vcat1 = structure(c(1L, 1L, 3L, 2L, 3L, 4L, 
4L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 4L, 3L, 4L, 4L, 4L, 1L, 2L, 4L, 
1L, 3L, 2L, 4L, 2L, 1L, 4L, 2L, 2L, 4L, 2L, 1L, 1L, 3L, 1L, 4L, 
4L, 4L, 4L, 2L, 4L, 1L, 4L, 3L, 1L, 4L, 4L, 1L), .Label = c("1", 
"2", "3", "4"), class = "factor")), .Names = c("vnum1", "vcat1" 
), row.names = c(NA, 50L), class = "data.frame") 

如果我使用tapply,我清楚地看到丟失的類別:

> with(mydf,tapply(vnum1, vcat1, mean)) 
     1   2   3   4 
0.09172749 0.48575555   NA 0.09632024 

但它是在聚合函數完全忽略:

> aggregate(vnum1~vcat1, mydf, mean) 
    vcat1  vnum1 
1  1 0.09172749 
2  2 0.48575555 
3  4 0.09632024 

我想也可以在聚合函數中獲得它。我該怎麼做?謝謝。

+1

'aggregate()'中的'na.action = NULL'。 –

+0

是的,它的工作原理!謝謝。如果你把它作爲答案,我會接受它。 – rnso

+0

或dplyr:'mydf%>%group_by(vcat1)%>%summarize(vnum1 = sum(vnum1))'或data.table:'setDT(mydf)[,。(vnum1 = sum(vnum1)),by = vcat1]' – alistaire

回答

2

在公式方法中,使用na.action = NULL來保留NA結果。

aggregate(vnum1 ~ vcat1, mydf, mean, na.action = NULL) 
# vcat1  vnum1 
# 1  1 0.09172749 
# 2  2 0.48575555 
# 3  3   NA 
# 4  4 0.09632024 

你也可以使用數據框的方法,不用擔心。

with(mydf, aggregate(list(vnum1 = vnum1), list(vcat1 = vcat1), mean))