2017-06-04 203 views
1

我有一個包含用戶ID和它們創建的對象數的數據集。我使用ggplot繪製了直方圖,現在我試圖將x值的累積和作爲一條線。目標是看到很多垃圾箱對總數的貢獻。我試過如下:R/ggplot直方圖中的累積和

ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2)+ 
    scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ 
    geom_line(aes(x=Num_Tours, y=cumsum(Num_Tours)/sum(Num_Tours)*3500),color="red")+ 
    scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./3500, name = "Cummulative percentage of routes [%]")) 

這不工作,因爲我不包括任何箱這樣的情節

ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2)+ 
    scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ 
    stat_bin(aes(y=cumsum(..count..)),binwidth = 0.2, geom="line",color="red")+ 
    scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./3500, name = "Cummulative percentage of routes [%]")) 

在由此產生: Result 1

這裏考慮計數的cumsum。我想要的是bin的count *值的cumsum。然後它應該正常化,以便它可以顯示在一個圖中。我試圖向是類似的東西:

Example

我將不勝感激任何投入!由於

編輯: 作爲測試數據,這應該工作:

userID <- c(1:100) 
Num_Tours <- sample(1:100,100) 
userStats <- data.frame(userID,Num_Tours) 
userStats$cumulative <- cumsum(userStats$Num_Tours/sum(userStats$Num_Tours)) 
+0

例如數據請 – mtoto

回答

1

下面是一個說明性的例子,可能對你有所幫助。

set.seed(111) 
userID <- c(1:100) 
Num_Tours <- sample(1:100, 100, replace=T) 
userStats <- data.frame(userID, Num_Tours) 

# Sorting x data 
userStats$Num_Tours <- sort(userStats$Num_Tours) 
userStats$cumulative <- cumsum(userStats$Num_Tours/sum(userStats$Num_Tours)) 

library(ggplot2) 
# Fix manually the maximum value of y-axis 
ymax <- 40 
ggplot(data=userStats,aes(x=Num_Tours)) + 
    geom_histogram(binwidth = 0.2, col="white")+ 
    scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ 
    geom_line(aes(x=Num_Tours,y=cumulative*ymax), col="red", lwd=1)+ 
    scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./ymax, 
    name = "Cumulative percentage of routes [%]")) 

enter image description here

+0

非常感謝!那就是訣竅。萬分感激。 – Chris