我有詞的頻率,如一個數據幀:計算比例按組地從數據幀
df <- data.frame(
Predictor = c("for","of","as","for","for","as","of","of","as","for"),
ToPredict = c("sure","course","much","him","keeps","far","them","this","an","petes"),
Freq = c(53,32,21,17,13,5,3,2,2,1))
欲計算新的列,它是每個ToPredict構成每個預測的比例。
所以,在上面的例子中,這個新列的值將是:
df$Props = c(0.631,0.865,0.75,0.202,0.155,0.179,0.081,0.054,0.071,0.012)
目前,我有和的數據幀:
sums <- aggregate(df$Freq, by=list(Category=df$Predictor), FUN=sum)
,我曾嘗試:
df$Props <- with(df, Freq/sums$x[which(sums$Category == Predictor)])
很明顯,這是行不通的。但我不知道會發生什麼。任何幫助最受讚賞。
我有一個偷渡懷疑這是一個重複的問題,但用'(DF,AVE(頻率,預測,FUN = prop.table))'應做到這一點。 – thelatemail
可能重複的候選人,雖然答案不是很好 - http://stackoverflow.com/questions/15009011/calculate-proportions-within-subsets-of-a-data-frame和http://stackoverflow.com/questions/26885819 /按數據集的子集計算比例 – thelatemail
這很有可能。但是,我找不到有關搜索的答案。你的解決方案有效謝謝! – davo1979