2017-02-16 22 views
1

我有詞的頻率,如一個數據幀:計算比例按組地從數據幀

df <- data.frame(
    Predictor = c("for","of","as","for","for","as","of","of","as","for"), 
    ToPredict = c("sure","course","much","him","keeps","far","them","this","an","petes"), 
    Freq = c(53,32,21,17,13,5,3,2,2,1)) 

欲計算新的列,它是每個ToPredict構成每個預測的比例。

所以,在上面的例子中,這個新列的值將是:

df$Props = c(0.631,0.865,0.75,0.202,0.155,0.179,0.081,0.054,0.071,0.012) 

目前,我有和的數據幀:

sums <- aggregate(df$Freq, by=list(Category=df$Predictor), FUN=sum) 

,我曾嘗試:

df$Props <- with(df, Freq/sums$x[which(sums$Category == Predictor)]) 

很明顯,這是行不通的。但我不知道會發生什麼。任何幫助最受讚賞。

+1

我有一個偷渡懷疑這是一個重複的問題,但用'(DF,AVE(頻率,預測,FUN = prop.table))'應做到這一點。 – thelatemail

+0

可能重複的候選人,雖然答案不是很好 - http://stackoverflow.com/questions/15009011/calculate-proportions-within-subsets-of-a-data-frame和http://stackoverflow.com/questions/26885819 /按數據集的子集計算比例 – thelatemail

+0

這很有可能。但是,我找不到有關搜索的答案。你的解決方案有效謝謝! – davo1979

回答

1
a=aggregate(df$Freq, by=list(df$Pred), FUN=sum) 
a1=a[,2] 
names(a1)=as.character(a[,1]) 
df$Props=df$Freq/a1[df$Pred] 
+0

這個也適用。對我來說更直觀(雖然我會想象會更慢,因爲它會創建一個額外的向量)。不過,我不能接受我的(thelatemail)答案(至少不會立即)。所以這會起作用。 – davo1979

1

每thelatemail:

with(df, ave(Freq, Predictor, FUN=prop.table))