2015-04-29 63 views
1

我分級一些數據,目前有指定這樣的頻率的數據框是由兩列組成,一個指定的倉範圍和另一直方圖和密度圖: -從生成分級數據

> head(data) 
     binRange Frequency 
1 (0,0.025]  88 
2 (0.025,0.05]  72 
3 (0.05,0.075]  92 
4 (0.075,0.1]  38 
5 (0.1,0.125]  20 
6 (0.125,0.15]  16 

我想用這個來繪製直方圖和密度的情節,但我似乎無法找到這樣做的一種方式,以便無需產生新的垃圾箱等。利用這一解決方案here我試着做到以下幾點: -

p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity") 

但它崩潰。任何人都知道如何處理這個問題?

謝謝

+0

看看這個[post](http://stackoverflow.com/questions/18219704/histogram-of-分級數據幀-在-R)。 –

+0

謝謝你,只是更新了我的文章。我試圖做我的數據,所以我執行'p < - ggplot(數據,aes(x = binRange,y = Frequency))+ geom_histogram(stat =「identity」)'但它只是崩潰 – user2062207

+0

做什麼錯誤信息你得到? –

回答

3

問題是ggplot犯規理解這些數據,你輸入它,你需要重塑它像這樣的方式(我不是一個正則表達式高手,所以肯定有更好的方法做的是):

df <- read.table(header = TRUE, text = " 
       binRange Frequency 
1 (0,0.025]  88 
2 (0.025,0.05]  72 
3 (0.05,0.075]  92 
4 (0.075,0.1]  38 
5 (0.1,0.125]  20 
6 (0.125,0.15]  16") 

library(stringr) 
library(splitstackshape) 
library(ggplot2) 
# extract the numbers out, 
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+") 

# split the data using the , into to columns: 
# one for the start-point and one for the end-point 
df <- cSplit(df, "binRange") 

# plot it, you actually dont need the second column 
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) + 
    geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025)) 

,或者如果你不希望數據進行數值解釋,則可以只是簡單的做到以下幾點:

df <- read.table(header = TRUE, text = " 
       binRange Frequency 
1 (0,0.025]  88 
2 (0.025,0.05]  72 
3 (0.05,0.075]  92 
4 (0.075,0.1]  38 
5 (0.1,0.125]  20 
6 (0.125,0.15]  16") 

library(ggplot2) 
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity") 

你將不能夠繪製密度積無線你的數據,因爲它不是連續的,而是絕對的,這就是爲什麼我更喜歡第二種顯示方式,

+0

謝謝,出色地工作! – user2062207