2017-08-04 16 views
0

我需要遍歷倉庫項目數據並將這些數據重複粘貼到特定月份。在我的真實世界的應用程序中,我正在瀏覽500k行數據,而我的函數需要5分鐘才能運行,這是不實際的。通過sapply或lapply函數向下複製多個數據框行,而不是在R中循環

我需要一種方式來做一些類似的dplyr應用函數,最好是sapply或任何可以輸出數據幀的東西。下面是示例數據向您展示的概念:

library(lubridate) 

# Item Data Frame 
item.df <- data.frame(Item = c("A1","A2","A3","A4","A5"), 
     Gross_Profit = c(15,20,8,18,29), 
     Launch_Date = c("2001-04-01","2001-04-05","2003-11-03","2015-02- 
11","2017-06-15")) 

# Months Data Frame 
five.months <- seq(ymd(paste(year(today()),month(today()),1))-months(5), 
        ymd(paste(year(today()),month(today()),1))-months(1), 
        by = "month") 
five.months.df <- data.frame(Month_Floor = five.months) 

# Function to copy Item Data for each Month 
repeat.item <- function(char.item,frame.months){ 
       df.item = NULL 

       for(i in 1:nrow(char.item)){ 
        Item <- rep(char.item[i,1],nrow(frame.months)) 
        Launch_Date <- rep(char.item[i,3],nrow(frame.months)) 
        df.col = frame.months 
        df.col = cbind(df.col,Item, Launch_Date)  
        df.item <- rbind(df.item, df.col) 
        } 

       return(df.item) 
       } 
# Result 
copied.df <- repeat.item(item.df,five.months.df) 

這裏有不同的結果:

> item.df 
Item Gross_Profit Launch_Date 
1 A1   15 2001-04-01 
2 A2   20 2001-04-05 
3 A3   8 2003-11-03 
4 A4   18 2015-02-11 
5 A5   29 2017-06-15 

> five.months.df 
Month_Floor 
1 2017-03-01 
2 2017-04-01 
3 2017-05-01 
4 2017-06-01 
5 2017-07-01 

> copied.df 
Month_Floor Item Launch_Date 
1 2017-03-01 A1 2001-04-01 
2 2017-04-01 A1 2001-04-01 
3 2017-05-01 A1 2001-04-01 
4 2017-06-01 A1 2001-04-01 
5 2017-07-01 A1 2001-04-01 
6 2017-03-01 A2 2001-04-05 
7 2017-04-01 A2 2001-04-05 
8 2017-05-01 A2 2001-04-05 
9 2017-06-01 A2 2001-04-05 
10 2017-07-01 A2 2001-04-05 
11 2017-03-01 A3 2003-11-03 
12 2017-04-01 A3 2003-11-03 
13 2017-05-01 A3 2003-11-03 
14 2017-06-01 A3 2003-11-03 
15 2017-07-01 A3 2003-11-03 
16 2017-03-01 A4 2015-02-11 
17 2017-04-01 A4 2015-02-11 
18 2017-05-01 A4 2015-02-11 
19 2017-06-01 A4 2015-02-11 
20 2017-07-01 A4 2015-02-11 
21 2017-03-01 A5 2017-06-15 
22 2017-04-01 A5 2017-06-15 
23 2017-05-01 A5 2017-06-15 
24 2017-06-01 A5 2017-06-15 
25 2017-07-01 A5 2017-06-15 

回答

2

我認爲你可以使用內置的merge功能:

copied.df = merge(five.months.df, item.df, by=NULL); 

它實現了兩個數據幀之間的交叉連接。如果你並不需要所有的列(如你的例子說明),您可以交叉連接(這應該提高性能)

copied.df = merge(five.months.df, subset(item.df, select=c("Item", "Launch_Date")), by=NULL); 
+0

工作就像一個魅力@Bruno Zemengo之前使用subset。比試圖運行並行處理更容易... – Sescopeland

相關問題