2016-08-29 190 views
6

乾杯, 我有兩個數據幀,結構如下。匹配並替換多個條件的數據幀列

DF1: 
Airlines   HeadQ  Date   Cost_Index 
American   PHX  07-31-2016  220 
American   ATL  08-31-2016  150 
American   ATL  10-31-2016  150 
Delta    ATL  10-31-2016  180 
American   ATL  08-31-2017  200 

第二數據幀DF2具有以下結構:

DF2: 
Airlines   HeadQ  Date   
American   ATL  09-30-2016 
Delta    ATL  03-31-2017 

現在找了數據幀DF1和DF2,我想改變DF1下面的數據幀。

DF1: 
Airlines   HeadQ  Date   Cost_Index 
American   PHX  07-31-2016  220 
American   ATL  08-31-2016  0 
American   ATL  10-31-2016  150 
Delta    ATL  10-31-2016  180 
American   ATL  08-31-2017  200 

的條件是,查找爲航空公司和DF2 DF1的HeadQ如果DF1 $日期< DF2 $日期然後進行Cost_Index爲0,否則繼續Cost_Index。

我試過了,沒有成功,具有:

DF1$Cost_Index <- ifelse(DF1$Airlines == DF2$Airlines & DF1$HeadQ == DF2$HeadQ 
     & DF1$Date < DF2$Date, 0, DF1$Cost_Index) 


Warning: 
1: In DF1$Airlines == DF2$Airlines : longer object 
length is not a multiple of shorter object length". 
2: In<=.default(DF1$Date, DF2$Date) : longer object length is not a 
multiple of shorter object length 

DF1: 
Airlines   HeadQ  Date   Cost_Index 
American   PHX  07-31-2016  220 
American   ATL  08-31-2016  0 
American   ATL  10-31-2016  0 
Delta    ATL  10-31-2016  0 
American   ATL  08-31-2017  200 

任何人都可以點我到正確的方向?

注:

str(DF1$Date): Date, format: "2016-10-31" 
str(DF2$Date): Date, format: "2016-08-31" 
+0

呈現與該數據有問題時,真正的約會最好向我們提供您的原始數據,以便我們知道您正在使用的是什麼,您可以提供這些數據嗎?或者至少''str(DF1)'' –

+0

剛剛添加了數據框架的結構:@Cyrus Mohammadian –

+0

你會得到一個錯誤?如果不是,你的上面的代碼是什麼產生的? –

回答

14

使用條件加入功能(因爲1.9.8),我會做到這一點,如下所示:

require(data.table) # v1.9.8+ 
# convert to data.tables, and Date column to Date class. 
setDT(df1)[, Date := as.Date(Date, format = "%m-%d-%Y")] 
setDT(df2)[, Date := as.Date(Date, format = "%m-%d-%Y")] 

df1[df2, on = .(Airlines, HeadQ, Date < Date), # find matching rows based on condition 
     Cost_Index := 0L]      # update column with 0 for those rows 

df1 
# Airlines HeadQ  Date Cost_Index 
# 1: American PHX 2016-07-31  220 
# 2: American ATL 2016-08-31   0 
# 3: American ATL 2016-10-31  150 
# 4: Delta ATL 2016-10-31  180 
+0

爲什麼DF1 $ Cost_Index2 <-ifelse(DF1 $ Airlines == DF2 $ Airlines&DF1 $ HeadQ == DF1 $ HeadQ&DF1 $ Date

+1

我不是最大的粉絲'ifelse',但只是運行,例如,'DF1 $ Airlines == DF2 $ Airlines'看看它給了什麼......提示:回收。您不能簡單地在此處將兩個相等/不相等的向量等同起來。對於DF2中的每一行,您必須在DF1中獲取所有匹配的行。 – Arun

+0

啊!好的,我看,在這種情況下,這種方法如何:DF1 $ Cost_Index [DF1 $ Airlines == DF2 $ Airlines&DF1 $ HeadQ == DF2 $ HeadQ&DF1 $ Date