2016-11-28 69 views
1

我有兩個簡單的數據框如下。我想使用dplyr和tidyverse來查找第二個數據框(Df2)的「任務2」中不在第一個數據幀(Df)的「任務」中的類別。我想爲此使用dplyr的「setdiff」函數。此外,我想保留第二個數據幀(Df2)的「時間」列中的相應時間。合併Dplyr加入並將操作設置爲自定義函數

因此,最終產品應包括兩行,一個用於客戶端「Chris」的「鐵襯衫」,總時間爲30,另一個客戶端爲「Eric」,帶有「購買雜貨」相應的時間爲8.

我也想刪掉日期欄。

我在想這樣做的一種方法是使用dplyr的「setdiff」函數(我意識到Task和Task2列名必須被改變,以便它們匹配)分離出兩行,然後重新加入帶連接功能的總時間。

最後,我想這是一個自定義函數,因爲我將不得不重複執行此任務。我想要一個像「差異(Df1,Df2)」這樣的函數......所以我可以輸入兩個數據框,並得到結果。

我希望這不是要求太多!我對自定義函數很陌生,特別是包含dplyr和管道的函數。

希望有人能幫助我!

CaseWorker<-c("John","John","Kim") 

Client<-c("Chris","Chris","Eric") 

Task<-c("Feed cat","Make dinner","Do homework") 

Date<-c("10/27/2016","09/22/2016","10/11/2016") 

Df<-data.frame(CaseWorker,Client,Date,Task) 

第二數據框...

CaseWorker<-c("John","John","John","John","John","John","John","John","John", 
      "John","Kim","Kim","Kim") 

Client<-c("Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Eric","Eric","Eric") 

Date<-c("11/10/2016","10/10/2016","11/13/2016","09/18/2016","11/11/2016","09/19/2016","08/08/2016","10/10/2016","08/05/2016","11/12/2016","09/09/2016","11/11/2016","09/10/2016") 

Task2<-c("Feed cat","Feed cat","Feed cat","Feed cat","Feed cat","Make dinner","Make dinner","Make dinner","Iron shirt","Iron shirt","Do homework", 
"Do homework","Buy groceries") 

Time<-c(20,34,11,10,5,6,55,30,20,10,12,10,8) 

Df2<-data.frame(CaseWorker,Client,Date,Task2,Time) 

回答

1

我們可以使用anti_join

library(dplyr) 
anti_join(Df2, Df, by = c("Task2"="Task")) %>% 
     group_by(CaseWorker,Client, Task2) %>% 
     summarise(Time = sum(Time)) 
# CaseWorker Client   Task2 Time 
#  <fctr> <fctr>  <fctr> <dbl> 
#1  John Chris Iron shirt 30 
#2  Kim Eric Buy groceries  8 

如果我們需要轉換爲功能

DiffGoals <- function(dat1, dat2) { 
      anti_join(dat1, dat2, by = c("Task2" = "Task")) %>% 
        group_by(CaseWorker, Client, Task2) %>% 
        summarise(Time = sum(Time)) 
} 

DiffGoals(Df2, Df) 
+0

謝謝!解決方案比我想象的簡單得多,而且效果很好。由於它只有三行,我認爲自定義函數不是必需的,但出於好奇,我仍然想知道如何實現。也許稱爲「DiffGo(Df1,Df2)?這是很容易做到的事情嗎? – Mike

相關問題