2016-07-17 55 views
0

我已加權樣品data_frame:計數位數爲加權樣品中的子組中的R

ID GROUP1 GROUP2 A  weight 
1 A  1  25  100  
2 B  1  31  120  
3 C  1  21  70  
4 A  2  55  63  
5 C  2  8  80  
6 C  2  41  80  
7 B  1  45  120  
8 A  2  23  63  

我想要計算的第五百分對於每個子組(Group1和Group2的組合)甲變量並分配這個值給每個人(新的列=「demanded_column」)。我想要這樣的東西,但也包括樣品重量:

data_frame$demanded_column<-ave(A, c(GROUP1, GROUP2), FUN = function (x) quantile (x, q=0.05, na.rm = TRUE)) 

回答

0

這個怎麼樣。我用splitHmisc::wtd.quantile計算5%分位數爲每個子組,然後用unsplit廣播結果返回到原來的尺寸:

df <- read.table("clipboard", header=TRUE) 


v <- lapply(split(df, df[2:3], drop=TRUE), function(x) { 
    Hmisc::wtd.quantile(x$A, x$weight, probs = 0.05, na.rm = TRUE) 
}) 

df$q05 <- unsplit(v, df[2:3], drop = TRUE) 

而結果:

> df 
    ID GROUP1 GROUP2 A weight q05 
1 1  A  1 25 100 25 
2 2  B  1 31 120 31 
3 3  C  1 21  70 21 
4 4  A  2 55  63 23 
5 5  C  2 8  80 8 
6 6  C  2 41  80 8 
7 7  B  1 45 120 31 
8 8  A  2 23  63 23 
0

可以使用的dplyrmagrittr組合:

library(dplyr) ## Importing dplyr will import the %>% operator from magrittr 

dframe <- structure(list(ID = 1:8, GROUP1 = structure(c(1L, 2L, 3L, 1L, 
3L, 3L, 2L, 1L), .Label = c("A", "B", "C"), class = "factor"), 
    GROUP2 = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L), A = c(25L, 31L, 
    21L, 55L, 8L, 41L, 45L, 23L), weight = c(100L, 120L, 70L, 
    63L, 80L, 80L, 120L, 63L)), .Names = c("ID", "GROUP1", "GROUP2", 
"A", "weight"), class = "data.frame", row.names = c(NA, -8L)) 

new_dframe <- dframe %>% group_by(GROUP1, GROUP2, weight) 
        %>% mutate(demanded_column = quantile(A,q=0.05)[[1]]) 

new_dframe 

ID GROUP1 GROUP2 A weight demanded_column 
    1  A  1 25 100    25 
    2  B  1 31 120    31 
    3  C  1 21  70    21 
    4  A  2 55  63    23 
    5  C  2 8  80    8 
    6  C  2 41  80    8 
    7  B  1 45 120    31 
    8  A  2 23  63    23 

我希望這有助於。

+1

如果您導入'dplyr',則不需要導入'magrittr';後者已經加載了管道。 – alistaire

+0

現在,這是減少整行代碼的東西:) – Abdou