在R中映射兩個數據幀，條件是其中一個數據幀的時間早於另一個數據幀

我希望通過用戶ID合併兩個數據集。我的問題是我必須過濾那些發生在另一個之後的數據集。一個簡單的例子是在R中映射兩個數據幀，條件是其中一個數據幀的時間早於另一個數據幀

# Dataset 1 (dts1) 

    User ID  date Hour  transactions  
1  5 25/07/2016 02:32  4   
2  6 24/07/2016 02:42  2  
3  8 25/07/2016 02:52  3   
4  9 24/07/2016 03:02  4   
5 11 25/07/2016 03:12  1   
6 13 26/07/2016 03:22  3

和

# Dataset 2 (dts2) 

    User ID date Hour  Events  
1  5 25/07/2016 02:31  8   
2  5 26/07/2016 02:42  6  
3  5 24/07/2016 07:52  9   
4 14 24/07/2016 03:02  5   
5  5 25/07/2016 09:12  10   
6  4 26/07/2016 03:22  4

祝只映射那些從數據SET2數據集1之前即發生。因此，理想情況我的輸出會像

#output 
    User ID Events Events transactions  
1  5   8  9   4

來源

2016-08-03 MFR

有什麼問題'？merge'？發生在另一個之後的數據集不清楚。 – akrun

這是否意味着輸出中的每個用戶ID將是任意長度的矢量（因爲某些用戶ID在數據集2中的匹配比其他用戶ID更多）？ –

@akrun我在兩個數據集中有日期，我需要比較這些日期， – MFR

鑑於數據dts1和dts2，並假設date和Hour是字符：

> dts1 
    UserID  date Hour transactions 
1  5 25/07/2016 02:32   4 
2  6 24/07/2016 02:42   2 
3  8 25/07/2016 02:52   3 
4  9 24/07/2016 03:02   4 
5  11 25/07/2016 03:12   1 
6  13 26/07/2016 03:22   3 
> dts2 
    UserID  date Hour Events 
1  5 25/07/2016 02:31  8 
2  5 26/07/2016 02:42  6 
3  5 24/07/2016 07:52  9 
4  14 24/07/2016 03:02  5 
5  5 25/07/2016 09:12  10 
6  4 26/07/2016 03:22  4

的基本思想是讓時間在這兩個dataframes可比性。首先我們轉換日期/小時dts2到POSIX類：

dts2$time <- strptime(paste(dts2$date, dts2$Hour), format="%d/%m/%Y %H:%M")

然後我們使用apply通過dts1迭代，從dts2匹配UserID，並與該時間比在數據集中的時間早的情況，找到行1：

dts1$Events <- apply(dts1[,c("UserID","date","Hour")], MAR=1, function(x) { 
    time1 <- strptime(paste(x[2], x[3]), format="%d/%m/%Y %H:%M") 
    rows <- which(dts2$UserID==as.numeric(x[1]) & dts2$time<time1) 
    if (length(rows)>0) {    
     dts2$Events[rows] 
    } else { 
     NA 
    } 
})

結果：

> dts1 
    UserID  date Hour transactions Events 
1  5 25/07/2016 02:32   4 8, 9 
2  6 24/07/2016 02:42   2  NA 
3  8 25/07/2016 02:52   3  NA 
4  9 24/07/2016 03:02   4  NA 
5  11 25/07/2016 03:12   1  NA 
6  13 26/07/2016 03:22   3  NA

來源

2016-08-03 03:53:44

非常感謝。這是輝煌的，拯救了我的一天。正是我在找什麼。我贊成你的回答，但似乎我沒有足夠的聲望來計算。 – MFR

一種altern使用dplyr和lubridate

# install.packages("dplyr") 
# install.packages("lubridate") 

library(dplyr) 
library(lubridate) 

# join the two data.frames by Used_ID 
left_join(dts1, dts2, by="User_ID") %>% 

# apply the filtering condition. dts1 must be after dts2 
    filter(dmy_hm(paste(date.x, Hour.x)) > 
     dmy_hm(paste(date.y, Hour.y))) %>% 

# Collapse the Events by user and transaction 
    group_by(User_ID, transactions) %>% summarise(Events = toString(Events))

來源

2016-08-03 05:24:50

感謝堆@dimitris_ps。您的解決方案也適用於我。 – MFR

我想知道你是否有時間看看這個問題？這一個是略有不同http://stackoverflow.com/questions/38779478/map-two-data-frames-in-a-certain-condition – MFR

在R中映射兩個數據幀，條件是其中一個數據幀的時間早於另一個數據幀

回答

相關問題