2016-01-23 30 views
5

後我能刪除兩個數據幀,DF1和DF2之間的匹配行,有些代碼由@Eric失敗好心提供:卸下前幾天和匹配

df1[!(apply(df1[1:2], 1, toString) %in% apply(df2[1:2], 1, toString)), ]

或用dplyr通過@steveb解決方案

df1 %>% filter(! ((date == df2$date) & (ticker == df2$ticker)))

不過,我意識到我需要刪除不僅共享行是這樣的:

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"), 
date = c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04"), stringsAsFactors=F) 
df1 

    ticker  date 
1 MSFT 2016-01-01 
2 MSFT 2016-01-02 
3 MSFT 2016-01-03 
4 MSFT 2016-01-04 

df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "FB"), 
date = c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-03"), stringsAsFactors=F) 
df2 

    ticker  date 
1 AAPL 2016-01-01 
2 GOOG 2016-01-01 
3 MSFT 2016-01-02 
4  FB 2016-01-03 

df3 

    ticker  date 
1 MSFT 2016-01-01 
2 MSFT 2016-01-03 
3 MSFT 2016-01-04 

但也是前一天和後一天指定的行。所以,我最後的DF將是:

ticker  date 
1 MSFT 2016-01-04 

通知,3 MSFT 2016-01-02是比賽,所以該行需要被刪除,與前一天和一天後,3 MSFT 2016-01-013 MSFT 2016-01-03

例如用兩場比賽一起:

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"), 
        date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")), 
        stringsAsFactors=F) 
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "MSFT"), 
        date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-02")), 
        stringsAsFactors=F) 

目標輸出:

ticker  date 
4 MSFT 2016-01-04 

回答

4

您可以將字符串轉換爲日期,以便您可以添加和減去天

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"), 
        date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")), 
        stringsAsFactors=F) 
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "FB"), 
        date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-03")), 
        stringsAsFactors=F) 


(m <- df2[(df2$date %in% df1$date) & (df2$ticker %in% df1$ticker), ]) 
# ticker  date 
# 3 MSFT 2016-01-02 

df1[!(df1$date %in% (m$date + c(-1,0,1))), ] 

# ticker  date 
# 4 MSFT 2016-01-04 

編輯 - 多場比賽,只是在每個日期

df1 <- data.frame(ticker = c("MSFT", "MSFT", "MSFT", "MSFT"), 
        date = as.Date(c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04")), 
        stringsAsFactors=F) 
df2 <- data.frame(ticker = c("AAPL", "GOOG", "MSFT", "MSFT"), 
        date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-01","2016-01-02")), 
        stringsAsFactors=F) 

(m <- df2[(df2$date %in% df1$date) & (df2$ticker %in% df1$ticker), ]) 
# ticker  date 
# 3 MSFT 2016-01-01 
# 4 MSFT 2016-01-02 

df1[!(df1$date %in% (sapply(m$date, function(x) x + c(-1,0,1)))), ] 
# ticker  date 
# 4 MSFT 2016-01-04 
+0

極其考究應用function(x)。它奇妙的工作,除非有兩個或更多的匹配,在這種情況下,我得到'警告信息: 在unclass(e1)+ unclass(e2): 更長的對象長度不是更短的對象長度的倍數,只有最後一個匹配被刪除。我試圖做一個for循環,只有當有超過兩場比賽時纔會運行,但我想有更好的方法。我添加了兩個匹配的例子,而不是我原來的問題。 – RyGuy

+1

@RyGuy試試'df1 [!(df1 $ date%in%(sapply(m $ date,function(x)x + c(-1,0,1)))),]' – rawr

+0

輝煌!謝謝。 – RyGuy