2016-07-16 347 views
-1

我需要更新一個有1000行的問題的電子表格。過濾多個條件的數據幀

我有兩個數據集:

DF

CompanyID1  TMC1 
ABC company  QBT 
BCD company  G W TMC 
jb hi fi  QBT 
ABC company  GW TMC 
FB Company  AMEX 
LL company  AMEX 
j k    QBT 
k. l company TP oil 
1 to 1 lts  TP oil 
2 in 1 pty ltd. AMEX 

DF2

DRA CompanyID2   TMC2 Status 
11 2 in 1 pty ltd.  AMEX sent 
12 1 to 1 lts   TP oil produce 
13 BCD company   ACE  sent 
14 k. l company  TP oil sent 
15 jb hi fi    QBT produce 
16 ABC company   QBT sent 
17 j k     QBT sent 
18 FB Company   AMEX sent 
19 facebook pty   QBT sent 
20 2 in 1 pty ltd.  AMEX produce 

我所試圖實現df2$CompanyID2首先找到df$CompanyID1值,如果有一個匹配,那麼如果其df$TMC1匹配df2$TMC2然後它必須有df2$status=='sent'然後在創建一個新列並返回df2$DRA值;如果df2$status=='produce'然後df$new應該有 '刪除'

「ABC公司」 從df2$CompanyID2存在df1$CompanyID1。 ABC公司的df$TMC1匹配df2$TMC2df2$status=='sent'。因此,df$new <- 16

我將非常感謝您的幫助。這將節省大量的時間,我可以用於其他生產目的。由於

dput(DF1)

structure(list(Company.ID1 = structure(c(3L, 4L, 7L, 3L, 5L, 
9L, 6L, 8L, 1L, 2L), .Label = c("1 to 1 lts", "2 in 1 pty ltd.", 
"ABC company", "BCD company", "FB Company", "j k ", "jb hi fi", 
"k. l company", "LL company"), class = "factor"), TMC1 = structure(c(4L, 
2L, 4L, 3L, 1L, 1L, 4L, 5L, 5L, 1L), .Label = c("AMEX", "G W TMC", 
"GW TMC", "QBT", "TP oil"), class = "factor")), .Names = c("Company.ID1", 
"TMC1"), class = "data.frame", row.names = c(NA, -10L)) 

dput(DF2)

structure(list(DRA = 11:20, Company.ID2 = structure(c(2L, 1L, 
4L, 9L, 8L, 3L, 7L, 6L, 5L, 2L), .Label = c("1 to 1 lts", "2 in 1 pty ltd.", 
"ABC company", "BCD company", "facebook pty", "FB Company", "j k ", 
"jb hi fi", "k. l company"), class = "factor"), TMC2 = structure(c(2L, 
4L, 1L, 4L, 3L, 3L, 3L, 2L, 3L, 2L), .Label = c("ACE", "AMEX", 
"QBT", "TP oil"), class = "factor"), Status = structure(c(2L, 
1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("produce", "sent" 
), class = "factor")), .Names = c("DRA", "Company.ID2", "TMC2", 
"Status"), class = "data.frame", row.names = c(NA, -10L)) 

for (i in 1:nrow(df1)) 
     { 
     if(df1$Company.ID1[i]==df2$Company.ID2[i] & df1$TMC1[i]==df2$TMC2[i] & df2$Status[i]=='sent') 
       data1$new[i]<- 'sent' 
}else{ data1$new<- 'delete'} 

但是可能有超過1家公司從df1$Company.ID1df2$Company.ID2同名並且它們也可以在不同的行中。

我的預期輸出將以下內容:

  1. df1$Company.ID1匹配X公司名稱df2$Company.ID2
  2. 如果匹配檢查X公司的data1$TMC1比賽df2df2$TMC2
  3. 如果1 & 2爲真,則檢查其狀態的公司x從df2$Status=='sent'
  4. 如果它是TRUE,那麼創建一個新的列df1 $ new並獲得DRA編號df$DRA,並存儲爲X公司

感謝

回答

1

這是一個合併和識別方法:

#Merge data on ID and TMC columns 
m <- merge(df2, df, by.x=c("CompanyID2", "TMC2"), 
     by.y=c("CompanyID1", "TMC1")) 

#If "sent" use DRA, if not "delete" 
m$Output <- ifelse(m$Status == "sent", as.character(m$DRA), "delete") 

#Remove unnecessary columns 
m[-(3:4)] 
# CompanyID2 TMC2 Output 
# 1  ABC QBT  16 
# 2  BCD ACE  13 
# 3   jb QBT delete 
+0

@pierre lafortune謝謝 – Chemjong

1

我們可以使用dplyr

library(dplyr) 
inner_join(df2, df1, by = c("CompanyID2" = "CompanyID1", "TMC2" = "TMC1")) %>% 
     mutate(Output = ifelse(Status == "sent", DRA, "delete")) 
1

另外一個使用sqldf

library(sqldf) 
res <- sqldf("select df2.CompanyID2,df2.TMC2, df2.Status, df2.DRA as output 
       from df1 
       join df2 on df1.CompanyID1=df2.CompanyID2 and df1.TMC1=df2.TMC2") 

res[res$Status=="produce",]$output <- "delete" 

     # CompanyID2 TMC2 Status output 
# 1  ABC company QBT sent  16 
# 2  jb hi fi QBT produce delete 
# 3  FB Company AMEX sent  18 
# 4   j k  QBT sent  17 
# 5 k. l company TP oil sent  14 
# 6  1 to 1 lts TP oil produce delete 
# 7 2 in 1 pty ltd. AMEX sent  11 
# 8 2 in 1 pty ltd. AMEX produce delete 
+0

或者最後一行的這種變化:'res [res $ Status ==「produce」,「output」] < - 「delete」 –