0
避免以下使用的for循環的最佳方法是什麼?他們遍歷「狀態」,使用它們來劃分數據以繪製標題和標題。也許一個apply
函數適合用戶定義的函數,但我不確定這會是什麼,它也可以避免for循環。避免在r中進行子集劃分和繪圖時
# Attach packages
library(ggplot2)
library(dplyr)
library(gridExtra)
# Create data
set.seed(123)
states <- c("s1","s2","s3")
d <- data.frame(Year = sample(2010:2016, 1000, replace=T),
Month = sample(1:12, 1000, replace=T),
DepartureState = sample(states, 1000, replace=T),
DestinationState = sample(states, 1000, replace=T),
Price = sample(5000:8000, 1000, replace=T),
Cost = sample(2000:3000, 1000, replace=T))
# Apply grouping
dg <- d %>%
group_by(Year, Month, DepState = DepartureState, DestState = DestinationState) %>%
summarise(sumPrice = sum(Price), sumCost = sum(Cost), diff = sumPrice-sumCost, vol = n())
# Add date column
dg$date <- as.POSIXct(paste(dg$Year, dg$Month, "01", sep = "-"))
# Do things, e.g. subset and plot, for each combination of DepState-DestState pairs
for (depState in states) {
for (destState in states) {
dgcut <- dg[dg$DepState == depState & dg$DestState == destState, ]
description <- paste0(depState," to ", destState)
plotname <- paste0(depState,"_",destState)
#png(filename=paste0(plotname,".png"))
p1 <- ggplot(dgcut, aes(x=date, y=sumPrice)) + geom_line()
p2 <- ggplot(dgcut, aes(x=date, y=sumCost)) + geom_line()
p3 <- ggplot(dgcut, aes(x=date, y=diff)) + geom_line()
p4 <- ggplot(dgcut, aes(x=date, y=vol)) + geom_line()
grid.arrange(p1,p2,p3,p4,ncol=1,top = description) # from gridExtra
#dev.off()
}
}
注意,我只是試圖避免for循環,因爲我知道它被認爲是不好的做法(或次優)像R的向量化的語言。如果不是這種情況屬實,請讓我知道!
很酷,我沒有想到嵌套像這樣的「應用」。謝謝@shreyasgm! – conor
使用'l_ply'(package plyr)而不是'lapply'會進一步加速進程。 –