2015-03-31 32 views
0

我的數據幀看起來像這樣:分割的數據到R基團

plant distance 
one 0 
one 1 
one 2 
one 3 
one 4 
one 5 
one 6 
one 7 
one 8 
one 9 
one 9.9 
two 0 
two 1 
two 2 
two 3 
two 4 
two 5 
two 6 
two 7 
two 8 
two 9 
two 9.5 

我想通過間隔每個電平的距離分成組(例如,間隔= 3),並計算每組的百分比。最後,繪製各組類似這樣的每一級的百分比:

enter image description here

我的代碼:

library(ggplot2) 
library(dplyr) 

dat <- data %>% 
    mutate(group = factor(cut(distance, seq(0, max(distance), 3), F))) %>% 
    group_by(plant, group) %>% 
    summarise(percentage = n()) %>% 
    mutate(percentage = percentage/sum(percentage)) 
p <- ggplot(dat, aes(x = plant, y = percentage, fill = group)) + 
    geom_bar(stat = "identity", position = "stack")+ 
    scale_y_continuous(labels=percent) 
p 

但我的情節如下:在group 4失蹤。 enter image description here

而且我發現dat錯了,group 4NA

enter image description here

可能的原因是,group 4長度均小於interval=3,所以我的問題是如何解決的呢?先謝謝你!

回答

0

我已經解決了這個問題。原因是cut(distance, seq(0, max(distance), 3), F)沒有包含最大值和最小值。

這裏是我的解決方案:

dat <- my_data %>% 
    mutate(group = factor(cut(distance, seq(from = min(distance), by = 3, length.out = n()/ 3 + 1), include.lowest = TRUE))) %>% 
    count(plant, group) %>% 
    group_by(plant) %>% 
    mutate(percentage = n/sum(n))