2017-02-28 20 views
0

我無法獲得此問題的解決方案。我有兩個數據框。 DF1和DF2。如果DF1中的時間戳在DF2中指定的時間間隔內,我想將DF2的列合併到DF1。 這裏是兩個dataframes的例子:如果x的時間戳在y的時間間隔內,則合併兩個數據幀

DF1 <- structure(list(Airspeed = c(582L, 478L, 524L), Outbound.Track = c(119L, 78L,134L), Rem.Ground.Dist = c(369L, 119L, 196L), Timestamp=structure(c(1451636817.52577, 1451638203.76569, 1451637753.43511),class = c("POSIXct", "POSIXt"), tzone = "")), .Names =c("Airspeed", "Outbound.Track","Rem.Ground.Dist", "Timestamp"), row.names =c(1L, 12L, 7L), class = c("data.table", "data.frame")) 

DF2 <- structure(list(Temperature = c(-18.5, -60, -35), Wind_Direction = c("324", "335", "313"), Wind_Speed = c("032", "041", "056"), onebef =structure(c(1451629620, 1451634660, 1451637000), class = c("POSIXct", "POSIXt"), tzone = ""), oneaft = structure(c(1451636820, 1451641860, 1451644200), class =c("POSIXct", "POSIXt"))), .Names = c("Temperature", "Wind_Direction", "Wind_Speed","onebef", "oneaft"), row.names = c(1358L, 1654L, 2068L), class = "data.frame") 

head(DF1) 
head(DF2) 

我想與DF2合併DF1。因此,如果匹配(DF1的時間戳在任何DF2的時間間隔內),則應將DF2(Wind_Speed,Wind_Direction,Temperature)的值添加到DF1。

兩個問題,我面對:

  1. 如何做好匹配/合併嗎?我的數據幀非常大(在DF1和DF2中有7000行)

  2. 如何確保DF1的行在有多個匹配的情況下是重複的?

我期待着您的幫助!謝謝

回答

2

你可以使用sqldf:

library(sqldf) 
df<-sqldf('select d1.*,d2.* 
      from DF1 d1 
      left join DF2 d2 
      on d1.Timestamp >= d2.onebef 
       AND d1.Timestamp <= d2.oneaft 
      ') 
df 
+0

或'...在d1.onebef和d2.oneaft之間的d1.Timestamp –

0

這將很好地工作的例子,但你可能會與真實數據的鬥爭,因爲它會創建一個非常大的數據集(結合DF1的每一行與DF2)之前,它保持相關的行。

試試看看它是如何工作的。

library(dplyr) 

DF1 <- structure(list(Airspeed = c(582L, 478L, 524L), Outbound.Track = c(119L, 78L,134L), Rem.Ground.Dist = c(369L, 119L, 196L), Timestamp=structure(c(1451636817.52577, 1451638203.76569, 1451637753.43511),class = c("POSIXct", "POSIXt"), tzone = "")), .Names =c("Airspeed", "Outbound.Track","Rem.Ground.Dist", "Timestamp"), row.names =c(1L, 12L, 7L), class = c("data.table", "data.frame")) 

DF2 <- structure(list(Temperature = c(-18.5, -60, -35), Wind_Direction = c("324", "335", "313"), Wind_Speed = c("032", "041", "056"), onebef =structure(c(1451629620, 1451634660, 1451637000), class = c("POSIXct", "POSIXt"), tzone = ""), oneaft = structure(c(1451636820, 1451641860, 1451644200), class =c("POSIXct", "POSIXt"))), .Names = c("Temperature", "Wind_Direction", "Wind_Speed","onebef", "oneaft"), row.names = c(1358L, 1654L, 2068L), class = "data.frame") 


merge(DF1, DF2) %>%         # combine every row of DF1 with DF2 
    filter(onebef <= Timestamp & Timestamp <= oneaft) # keep rows where Timestampe is between the interval 


# Airspeed Outbound.Track Rem.Ground.Dist   Timestamp Temperature Wind_Direction Wind_Speed    onebef    oneaft 
# 1  582   119    369 2016-01-01 08:26:57  -18.5   324  032 2016-01-01 06:27:00 2016-01-01 08:27:00 
# 2  582   119    369 2016-01-01 08:26:57  -60.0   335  041 2016-01-01 07:51:00 2016-01-01 09:51:00 
# 3  478    78    119 2016-01-01 08:50:03  -60.0   335  041 2016-01-01 07:51:00 2016-01-01 09:51:00 
# 4  524   134    196 2016-01-01 08:42:33  -60.0   335  041 2016-01-01 07:51:00 2016-01-01 09:51:00 
# 5  478    78    119 2016-01-01 08:50:03  -35.0   313  056 2016-01-01 08:30:00 2016-01-01 10:30:00 
# 6  524   134    196 2016-01-01 08:42:33  -35.0   313  056 2016-01-01 08:30:00 2016-01-01 10:30:00 
+0

Unfortuantely我得到differening行數的錯誤。任何方法來解決這個錯誤? – Anna2803

+0

在哪一步你會得到那個錯誤?它是在真實的數據集上還是在示例上? – AntoniosK

+0

在真實數據集中。示例完美。 – Anna2803

1

您可以使用merge()all = TRUE選項的DF1所有行與所有行DF2結合起來。然後,你可以檢查你的病情:

x <- merge(DF1, DF2, all = TRUE) 

x[x$Timestamp >= x$onebef & x$Timestamp <= x$oneaft,] 

    Airspeed Outbound.Track Rem.Ground.Dist   Timestamp Temperature Wind_Direction Wind_Speed    onebef 
1  582   119    369 2016-01-01 09:26:57  -18.5   324  032 2016-01-01 07:27:00 
4  582   119    369 2016-01-01 09:26:57  -60.0   335  041 2016-01-01 08:51:00 
5  478    78    119 2016-01-01 09:50:03  -60.0   335  041 2016-01-01 08:51:00 
6  524   134    196 2016-01-01 09:42:33  -60.0   335  041 2016-01-01 08:51:00 
8  478    78    119 2016-01-01 09:50:03  -35.0   313  056 2016-01-01 09:30:00 
9  524   134    196 2016-01-01 09:42:33  -35.0   313  056 2016-01-01 09:30:00 
      oneaft 
1 2016-01-01 09:27:00 
4 2016-01-01 10:51:00 
5 2016-01-01 10:51:00 
6 2016-01-01 10:51:00 
8 2016-01-01 11:30:00 
9 2016-01-01 11:30:00