2016-05-24 36 views
0

避免以下使用的for循環的最佳方法是什麼?他們遍歷「狀態」,使用它們來劃分數據以繪製標題和標題。也許一個apply函數適合用戶定義的函數,但我不確定這會是什麼,它也可以避免for循環。避免在r中進行子集劃分和繪圖時

# Attach packages 
library(ggplot2) 
library(dplyr) 
library(gridExtra) 

# Create data 
set.seed(123) 
states <- c("s1","s2","s3") 
d <- data.frame(Year = sample(2010:2016, 1000, replace=T), 
      Month = sample(1:12, 1000, replace=T), 
      DepartureState = sample(states, 1000, replace=T), 
      DestinationState = sample(states, 1000, replace=T), 
      Price = sample(5000:8000, 1000, replace=T), 
      Cost = sample(2000:3000, 1000, replace=T)) 

# Apply grouping 
dg <- d %>% 
    group_by(Year, Month, DepState = DepartureState, DestState = DestinationState) %>% 
    summarise(sumPrice = sum(Price), sumCost = sum(Cost), diff = sumPrice-sumCost, vol = n()) 

# Add date column 
dg$date <- as.POSIXct(paste(dg$Year, dg$Month, "01", sep = "-"))  

# Do things, e.g. subset and plot, for each combination of DepState-DestState pairs 
for (depState in states) { 
    for (destState in states) { 
    dgcut <- dg[dg$DepState == depState & dg$DestState == destState, ] 

    description <- paste0(depState," to ", destState) 
    plotname <- paste0(depState,"_",destState) 
    #png(filename=paste0(plotname,".png")) 
    p1 <- ggplot(dgcut, aes(x=date, y=sumPrice)) + geom_line() 
    p2 <- ggplot(dgcut, aes(x=date, y=sumCost)) + geom_line() 
    p3 <- ggplot(dgcut, aes(x=date, y=diff)) + geom_line() 
    p4 <- ggplot(dgcut, aes(x=date, y=vol)) + geom_line() 
    grid.arrange(p1,p2,p3,p4,ncol=1,top = description) # from gridExtra 
    #dev.off() 
    } 
} 

注意,我只是試圖避免for循環,因爲我知道它被認爲是不好的做法(或次優)像R的向量化的語言。如果不是這種情況屬實,請讓我知道!

回答

0

鏈接lapply而不是for循環?

doThings <- function(depState,destState) { 
    dgcut <- dg[dg$DepState == depState & dg$DestState == destState, ] 

    description <- paste0(depState," to ", destState) 
    plotname <- paste0(depState,"_",destState) 
    #png(filename=paste0(plotname,".png")) 
    p1 <- ggplot(dgcut, aes(x=date, y=sumPrice)) + geom_line() 
    p2 <- ggplot(dgcut, aes(x=date, y=sumCost)) + geom_line() 
    p3 <- ggplot(dgcut, aes(x=date, y=diff)) + geom_line() 
    p4 <- ggplot(dgcut, aes(x=date, y=vol)) + geom_line() 
    grid.arrange(p1,p2,p3,p4,ncol=1,top = description) # from gridExtra 
    #dev.off() 
} 

lapply(states,function(x) lapply(states,doThings,destState = x)) 
+0

很酷,我沒有想到嵌​​套像這樣的「應用」。謝謝@shreyasgm! – conor

+0

使用'l_ply'(package plyr)而不是'lapply'會進一步加速進程。 –

相關問題