2013-03-08 55 views
0

我有length(Date_List)我有信息的天數length(ISIN_Table$ID)項目。 對於每一天(j中的循環),我創建一個可容納所有項目(length(ISIN_Table$ID))和一些列(4)的零數據框。緩慢data.frame填充

每個項目在每個矩陣中都是一排,但根據日期會有不同的填充。

#create list that will hold matrices 
df.list<-vector("list", length(Dates_List)) 
for (j in 1:(length(Dates_List))){ 
    df.list[[j]] <- data.frame(matrix(0, nrow = length(ISIN_Table$ID),ncol=4)) 
} 

#Loop over number of days 
for (j in 1:(length(Dates_List))){ 
    date<-Dates_List[j] 
    #create empty dataframe 
    df.list[[j]] <- data.frame(matrix(0, nrow=length(ISIN_Table$ID), ncol=4)) 

    #loop over every item 
    for (i in 1:(length(ISIN_Table$ID))){ 
    #check whether item is known at date 
    if (nrow(data.raw[data.raw$ID==i & data.raw$Date==date,]) < 1){ 
     ID<-i 
     df.list[[j]][i,1]<-date 
     df.list[[j]][i,2]<-ID  #fill up the row 
    } 
    else{ 
     #fill up the row 
     df.list[[j]][i,]<-c(
     as.character(data.raw[data.raw$ID==i & data.raw$Date==date,"Date"]), 
     (data.raw[data.raw$ID==i & data.raw$Date==date,"ID"]), 
     (data.raw[data.raw$ID==i & data.raw$Date==date,"Bid.Price"]), 
     (data.raw[data.raw$ID==i & data.raw$Date==date,"Ask.Price"])) 
    } 
    } 
} 

該代碼給了我想要的確切輸出,但它令人難以置信的速度緩慢。我將不勝感激關於如何提高速度的任何意見,目前的版本是行不通的。

UPDATE:

# create dummy data: 

Dates_List<-c("2007-01-02", "2007-01-03") 
ISIN_Table<-data.frame(c(1,2,3)) 
colnames(ISIN_Table)<-"ID" 
ID<-rep(1:2, len=2, each=1) 
Date<-c("2007-01-02","2007-01-02","2007-01-03", "2007-01-03") 
Bid.Price<-rep(100,4) 
Ask.Price<-rep(100,4) 
data.raw<-data.frame(ID, Date, Bid.Price, Ask.Price) 

問計df.list [[1]]返回:

  X1 X2 X3 X4 
1 2007-01-02 1 100 100 
2 2007-01-02 2 100 100 
3 2007-01-02 3 0 0 
+0

for R中的循環很慢。你可以嘗試'應用'家庭功能。也沒有可重複的數據,很難回答這樣的問題。 – 2013-03-08 14:46:01

+0

看起來像你只是想分割data.raw的日期,如果你沒有任何特定的'ID'爲任何特定的日期,你正在用0 – 2013-03-08 14:52:52

+6

'for'循環並不慢。創建和子集數據框很慢。 – Roland 2013-03-08 14:53:22

回答

1

UPDATE 按@ Arun的建議,你可以拆分前添加缺少的行完全避免適應症

Dates_List <- c("2007-01-02", "2007-01-03") 
ISIN_Table <- data.frame(c(1, 2, 3)) 
colnames(ISIN_Table) <- "ID" 
ID <- rep(1:2, len = 2, each = 1) 
Date <- c("2007-01-02", "2007-01-02", "2007-01-03", "2007-01-03") 
Bid.Price <- rep(100, 4) 
Ask.Price <- rep(100, 4) 
data.raw <- data.frame(ID, Date, Bid.Price, Ask.Price) 

temp <- expand.grid(Dates_List, ISIN_Table$ID) 
names(temp) <- c("Date", "ID") 

data.raw <- merge(temp, data.raw, all.x = TRUE) 
data.raw[is.na(data.raw)] <- 0 
data.raw 
##   Date ID Bid.Price Ask.Price 
## 1 2007-01-02 1  100  100 
## 2 2007-01-02 2  100  100 
## 3 2007-01-02 3   0   0 
## 4 2007-01-03 1  100  100 
## 5 2007-01-03 2  100  100 
## 6 2007-01-03 3   0   0 


splitdata <- split(data.raw, data.raw$Date) 

splitdata 
## $`2007-01-02` 
##   Date ID Bid.Price Ask.Price 
## 1 2007-01-02 1  100  100 
## 2 2007-01-02 2  100  100 
## 3 2007-01-02 3   0   0 
## 
## $`2007-01-03` 
##   Date ID Bid.Price Ask.Price 
## 4 2007-01-03 1  100  100 
## 5 2007-01-03 2  100  100 
## 6 2007-01-03 3   0   0 

OLD ANSWER

您可以使用split分裂按日期,然後俏皮使用mapplymerge數據得到行甚至不具備在指定日期的任何數據的ID。

Dates_List <- c("2007-01-02", "2007-01-03") 
ISIN_Table <- data.frame(c(1, 2, 3)) 
colnames(ISIN_Table) <- "ID" 
ID <- rep(1:2, len = 2, each = 1) 
Date <- c("2007-01-02", "2007-01-02", "2007-01-03", "2007-01-03") 
Bid.Price <- rep(100, 4) 
Ask.Price <- rep(100, 4) 
data.raw <- data.frame(ID, Date, Bid.Price, Ask.Price) 

splitdata <- split(data.raw, data.raw$Date) 

mapply(FUN = function(x, date) merge(x, 
          data.frame(ID = ISIN_Table$ID, 
            Date = rep(date, length(ISIN_Table$ID))), 
           all.y = TRUE), 
     splitdata, t(names(splitdata)), SIMPLIFY = FALSE) 

## $`2007-01-02` 
## ID  Date Bid.Price Ask.Price 
## 1 1 2007-01-02  100  100 
## 2 2 2007-01-02  100  100 
## 3 3 2007-01-02  NA  NA 
## 
## $`2007-01-03` 
## ID  Date Bid.Price Ask.Price 
## 1 1 2007-01-03  100  100 
## 2 2 2007-01-03  100  100 
## 3 3 2007-01-03  NA  NA 
+0

(+1)非常好的使用'expand.grid'和'merge'! – Arun 2013-03-08 17:10:46