2016-02-26 189 views
0

給予以下data.frame分組頻率表,我想計算的VAR每個變量的次數和這些occurence由分組變量GROUP百分比:ř計算與百分比

GROUP<-c("G1","G2","G1","G2","G3","G3","G1") 
VAR<-c("A","B","B","A","B","B","A") 
d<-data.frame(GROUP,VAR) 

隨着table(),我得到一個不錯的頻率表,計算兩個變量的所有組合的出現次數:

d<-as.data.frame(table(d)) 
    GROUP VAR Freq 
1 G1 A 2 
2 G2 A 1 
3 G3 A 0 
4 G1 B 1 
5 G2 B 1 
6 G3 B 2 

現在我想計算每個變量FO的百分比r VAR,作者GROUP。到目前爲止,我將拆分爲data.frame,並分別計算G1,G2G3的百分比,然後合併。

d.G1<-d[d$GROUP=="G1",] 
d.G1$per<-d.G1$Freq/sum(d.G1$Freq) 
d.G1 
    GROUP VAR Freq  per 
1 G1 A 2 0.6666667 
4 G1 B 1 0.3333333 

...

d.merge<-rbind(d.G1,d.G2,d.G3) 
d.merge 
GROUP VAR Freq  per 
1 G1 A 2 0.6666667 
4 G1 B 1 0.3333333 
2 G2 A 1 0.5000000 
5 G2 B 1 0.5000000 
3 G3 A 0 0.0000000 
6 G3 B 2 1.0000000 

是否有使用例如reshape2包一個更優雅的解決方案嗎?

+1

爲什麼不'as.data.frame(prop.table(table(d),1))'? – lukeA

+0

我認爲這是一個非常優雅的解決方案。我將添加它作爲答案。 –

回答

1

隨着dplyr包,你可以這樣做:

require(dplyr) 

d <- d %>% group_by(GROUP) %>% mutate(per = Freq/sum(Freq)) 
1

這個答案是從@lukeA評論快到了,我覺得這是一個非常優雅的解決方案,如果你只需要百分比:

d<-as.data.frame(prop.table(table(d),1)) 
0

使用data.table,你可以如下做到這一點:

library(data.table) 
GROUP<-c("G1","G2","G1","G2","G3","G3","G1") 
VAR<-c("A","B","B","A","B","B","A") 
DT <-data.table(GROUP,VAR) 

# Create count 
DT1 <- DT[, list(Count=.N), by=.(GROUP, VAR)] 
# melt and dcast to get all combinations of GROUP and VAR 
# as in your output. You can remove it if all combinations 
# not required 
DT2 <- dcast(DT1, GROUP ~ VAR) 
DT3 <- melt(DT2, id.var="GROUP") 
# Replace na values with zero 
DT3[,lapply(.SD,function(x){ifelse(is.na(x),0,x)})] 
# Create percentage 
DT3[, percent:=value/sum(value, na.rm=TRUE), by=GROUP] 

我試圖保持出作爲你的輸出。因此不得不做點播和融化。如果不需要,這些可以省略。