我有相當大的數據集(超過1萬行),它的一個小樣本是在這裏:使用的R - 可變binwidths頻數和因素
structure(list(Feret = c(0.017, 0.016, 2.12, 0.016, 0.02, 0.023,
0.017, 0.021, 0.02, 0.016, 0.027, 0.052, 0.061, 0.033, 0.041,
0.017, 6.561, 7.123, 0.027, 0.018, 0.024, 4.099, 0.022, 0.025,
0.037, 0.037, 0.018, 0.039, 0.027, 0.053, 0.016, 0.107, 0.52,
0.041, 0.038, 0.039, 0.03, 0.071, 0.022, 0.118, 0.032, 0.018,
0.027, 0.035, 8.113, 0.078, 4.089, 0.035, 0.057, 6.905, 2.5,
0.282, 0.045, 0.039, 0.071, 0.037, 0.029, 0.027, 0.016, 0.02,
0.026, 0.025, 0.026, 0.016, 0.016, 0.021), sample.type = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("flower", "leaf"), class = "factor"), leaf.side = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("lower", "upper"), class = "factor"), canopy = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("bottom", "top"), class = "factor"), treatment = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("blue", "green", "grey", "white", "yel-green"
), class = "factor")), .Names = c("Feret", "sample.type", "leaf.side",
"canopy", "treatment"), row.names = c(500000L, 500001L, 500002L,
500003L, 500004L, 500005L, 500006L, 500007L, 500008L, 500009L,
500010L, 800000L, 800001L, 800002L, 800003L, 800004L, 800005L,
800006L, 800007L, 800008L, 800009L, 800010L, 1000L, 1001L, 1002L,
1003L, 1004L, 1005L, 1006L, 1007L, 1008L, 1009L, 1010L, 10000L,
10001L, 10002L, 10003L, 10004L, 10005L, 10006L, 10007L, 10008L,
10009L, 10010L, 100000L, 100001L, 100002L, 100003L, 100004L,
100005L, 100006L, 100007L, 100008L, 100009L, 100010L, 1160000L,
1160001L, 1160002L, 1160003L, 1160004L, 1160005L, 1160006L, 1160007L,
1160008L, 1160009L, 1160010L), class = "data.frame")
我一直在試圖建立的頻率計數'費雷特' 用下面的binswidths變量:
bins <- c(0.01,0.03,0.1,0.3,1,3,10)
,然後使用:
freq<-hist(df_temp$Feret, breaks=bins)
ranges<-paste(head(bins,-1),bins[-1],sep=" - ")
freq$counts
df5<-data.frame(ranges = ranges, frequency = freq$counts)
df5
但我真正需要做的是將各種因素(「sample.type」,「leaf.side」,「冠層」,「處理」)分成數據框,併爲每個子集提取頻率計數。 我可以通過手動創建每個子集來做到這一點,但我想做一個更好的方法。我試過使用循環來創建子集,然後將hist()函數應用於每個子集,但這需要很長時間。使用Dplyr還是Apply有更好的方法? 我寧願只將結果放在表格中,然後根據需要繪製它們。
也許像'DF%>%變異(費雷特=切(費雷特,break = bins))%>%count_(。,names(。))'? –
'表(切(DF $費雷特,垃圾箱))' – SabDeM