我想了解如何完成「group by」和「count」功能。我看了好幾篇文章,沒有找到我想要的東西;如果有已經發布的答案,我會很感激鏈接。是否存在與SELECT ... COUNT(*)... GROUP BY ...等價的等價物?
例如,我正在查找數據中的異常值;我想知道哪些地方收到的最「壞」的措施:
place = rep(c('AL','AK','AZ','AR','CA','CO','CT','DE','FL','GA','HI'), times=4)
measure = rep(c('meas1','meas2','meas3','meas4'), each=11)
set.seed(200)
rating = sample(c('good','bad'), size = 44, prob=c(2,1), replace=T)
df = data.frame(place, measure, rating)
> df
place measure rating
1 AL meas1 good
2 AK meas1 good
3 AZ meas1 good
4 AR meas1 bad
5 CA meas1 bad
6 CO meas1 bad
7 CT meas1 bad
8 DE meas1 good
9 FL meas1 good
10 GA meas1 good
....(etc).....
我想了解如何使用tidyverse做到這一點。這種方法使用sqldf給我我想要的東西,也就是告訴我哪些地方過的最「壞」的收視率,並通過他們的「壞性」
library(sqldf)
sqldf("SELECT place, rating, COUNT(*) AS Count FROM df GROUP BY place, rating ORDER BY rating, count DESC").
place rating Count
1 CA bad 3
2 AK bad 2
3 AR bad 1
4 CO bad 1
5 CT bad 1
6 DE bad 1
7 FL bad 1
8 GA bad 1
9 AL good 4
10 AZ good 4
11 HI good 4
....(etc)....
居的地方有沒有辦法做得到類似的結果在tidyverse?
嘗試'df%>%count(place,rating)%>%arrange(rating,desc(n))' –
你能解釋一下嗎?它當然是做我希望的。 – cumin
嘗試使用'?count','?arrange'和'?desc' ..閱讀手冊可能會幫助您學到一兩件事 –