2015-08-21 71 views
0

我還沒有找到一個在網絡上的解決方案,因爲它是不容易拿出正確的問題。 我有兩個data.frames,x和y,並且想要合併它們以獲得z:動態更新/合併兩個data.frames在R

棘手的是,z比較x和y的日期值,並採用最近的觀察來更新A, B,C和D.因此「動態」更新/組合。

x=data.frame(c("2000-01-01","2000-06-01","2001-01-01"),c("100","100","100"),c("200","200","200")) 
colnames(x)=c("Date","A","B") 

y=data.frame(c("2000-01-05","2000-04-09"),c("10","0"),c("0","35")) 
colnames(y)=c("Date","C","D") 

z=data.frame(c("2000-01-01","2000-01-05","2000-04-09","2000-06-01","2001-01-01"),c("100","100","100","100","100"),c("200","200","200","200","200"),c("0","10","10","0","0"),c("0","0","35","0","0")) 
colnames(z)=c("Date","A","B","C","D") 

x$Date = as.Date(x$Date) 
y$Date = as.Date(y$Date) 

問題:如何通過高效的代碼到達z

舉例說明:

> x 
     Date A B 
1 2000-01-01 100 200 
2 2000-06-01 100 200 
3 2001-01-01 100 200 
> y 
     Date C D 
1 2000-01-05 10 0 
2 2000-04-09 0 35 
> z 
     Date A B C D 
1 2000-01-01 100 200 0 0 
2 2000-01-05 100 200 10 0 
3 2000-04-09 100 200 10 35 
4 2000-06-01 100 200 10 35 
5 2001-01-01 100 200 10 35 
> 

編輯: 感謝下面的答案。 的解決方案似乎是一個簡單的完全連接,然後在一個循環的循環(我想通了,第二個步驟):

x$Date = as.Date(x$Date) 
y$Date = as.Date(y$Date) 

tt=merge(x,y,by='Date',all=TRUE) 

for (i in 2:(ncol(x)+ncol(y)-1)){ 
    for (j in 2:(nrow(x)+nrow(y))){ 
    if (is.na(tt[j,i])==TRUE & is.na(tt[j-1,i])==FALSE){ 
     tt[j,i]=tt[j-1,i]} 
    } 
} 

EDIT2:他人發佈下文進一步的解決方案似乎是更有效的。

y=data.frame(c("2000-01-05","2000-04-09"),c("10",NA),c(NA,"35")) 
colnames(y)=c("Date","C","D") 

,然後更換來港Z中的最後一步:只是爲了完整性,如果Y中的0被替換NA,即定義爲Ÿ我再解決方案的工作原理。

我從我的第一個編輯中學到了東西,我不是在編輯上面的原始問題以避免混淆。

非常感謝您的幫助!

+0

您可能需要「日期」列轉換爲「日期」類。即'x $ Date < - as.Date(x $ Date)',對於y也是如此。'df1 < - merge(x,y,by ='Date',all = TRUE); df2 < - df1 [order(df1 $ Date),];庫(動物園); df2 [2:3] < - lapply(df2 [2:3],na.locf)' – akrun

+0

感謝您的快速回復。非常接近,但仍然不起作用,因爲元素z [3,4]仍然是NA而不是10.將df2 [2:3]設置爲df2 [2:5]不起作用。任何想法? – fuji2015

+0

'df2 [4:5] [is.na(df2 [4:5])] < - 0'會將它設置爲0 – akrun

回答

3

一種可能的解決方案可以是使用從zoo packackages的data.table的組合和na.locf功能:

# loading the needed packages 
library(data.table) 
library(zoo) 

# converting x & y to datatables 
setDT(x) 
setDT(y) 

# merge x & y into z 
z <- merge(x, y, by="Date", all=TRUE) # this works in base R as well 

# fill the NA's with the last observation 
cols <- c("A","B","C","D") # in this specific case, you can also use: LETTERS[1:4] 
z[, (cols) := lapply(.SD, na.locf, rule = 1, na.rm=FALSE), .SDcols= cols] 

這給出:

> z 
     Date A B C D 
1: 2000-01-01 100 200 NA NA 
2: 2000-01-05 100 200 10 0 
3: 2000-04-09 100 200 0 35 
4: 2000-06-01 100 200 0 35 
5: 2001-01-01 100 200 0 35 

該結果也可以在基礎R來實現正如@Tensibai在評論中提到的那樣(由於某些原因,我的系統一開始並不適用):

z <- merge(x, y, by="Date", all=TRUE) 
z <- na.locf(z) 

得到確切所需的輸出,你需要一些額外的步驟(省略,因爲它們是相同的第一個步驟):

# merge x & y into z 
z <- merge(x, y, by="Date", all=TRUE) # this works in base R as well 

# replace the zero with NA 
z[z==0] <- NA 

# fill the NA's with the last observation 
cols <- LETTERS[1:4] 
z[, (cols) := lapply(.SD, na.locf, rule = 1, na.rm=FALSE), .SDcols= cols] 

# replace the remaining NA's with zero's 
z[is.na(z)] <- 0 

這給:

> z 
     Date A B C D 
1: 2000-01-01 100 200 0 0 
2: 2000-01-05 100 200 10 0 
3: 2000-04-09 100 200 10 35 
4: 2000-06-01 100 200 10 35 
5: 2001-01-01 100 200 10 35 

在你會做的基地R:

z <- merge(x, y, by="Date", all=TRUE) 
z[z==0] <- NA 
z <- na.locf(z) 
z[is.na(z)] <- 0 

得到相同的結果。

+0

我認爲你可以縮短這到'z < - merge(setDT(x),setDT(y),by =「Date」,all = TRUE); cols < - LETTERS [1:4]; z [,(cols):= lapply(.SD,na.locf,rule = 1,na.rm = FALSE),.SDcols = cols]',也不確定這裏是否需要'rule = 1' –

+0

Don當他在編輯中改變它時,不會根據他所需的輸出進行更新。我認爲更好地更新問題中的期望輸出。 –

0

使用dplyr和一些功能的另一種方法:

library(lubridate) 
library(dplyr) 

# dataset 
x=data.frame(c("2000-01-01","2000-06-01","2001-01-01"), 
      c("100","100","100"), 
      c("200","200","200"), stringsAsFactors = F) 
colnames(x)=c("Date","A","B") 

y=data.frame(c("2000-01-05","2000-04-09"), 
      c("10","0"), 
      c("0","35"), stringsAsFactors = F) 
colnames(y)=c("Date","C","D") 

# update date columns 
x$Date = ymd(x$Date) 
y$Date = ymd(y$Date) 

# function that replaces NAs with 0s 
ff = function(x){x[is.na(x)]=0 
       return(as.numeric(x))} 

# function that updates zero elements with the previous ones 
ff2 = function(x){ 

    for (i in 2:length(x)){x[i] = ifelse(x[i]==0, x[i-1], x[i])} 

    return(x) 

} 

# create the full dataset 
xy = 
    x %>% 
    full_join(y, by="Date") %>% 
    arrange(Date) 

xy 

#   Date A B C D 
# 1 2000-01-01 100 200 <NA> <NA> 
# 2 2000-01-05 <NA> <NA> 10 0 
# 3 2000-04-09 <NA> <NA> 0 35 
# 4 2000-06-01 100 200 <NA> <NA> 
# 5 2001-01-01 100 200 <NA> <NA> 


    xy %>% 
    group_by(Date) %>% 
    mutate_each(funs(ff)) %>% 
    ungroup %>% 
    select(-Date) %>% 
    mutate_each(funs(ff2)) %>% 
    bind_cols(data.frame(Date=xy$Date)) %>% 
    select(Date,A,B,C,D) 

#   Date A B C D 
# 1 2000-01-01 100 200 0 0 
# 2 2000-01-05 100 200 10 0 
# 3 2000-04-09 100 200 10 35 
# 4 2000-06-01 100 200 10 35 
# 5 2001-01-01 100 200 10 35