2016-11-14 71 views
0

使用下面的示例,我想按CaseWorker分組數據幀,然後客戶端,然後爲每個客戶端組確定是否在「任務」與「任務2」中的任務列表相同。使用Dplyr的「group_by」創建組,然後使用Stringr查找組之間的差異

如果每個處於「任務2」但不是「任務」的任務都可以提取並顯示在新的列或數據框中,我會很高興有一個簡單的真或假,或更好。

所以基本上我需要確保「任務」和「任務2」爲每個客戶端包含相同的條目。

如果可能的話,我想堅持使用Dplyr和Stringr,或者至少留在Tidyverse中。我認爲有一種使用「group_by」和「str_detect」或其他一些Stringr功能以優雅的方式實現這一點的方法。

CaseWorker<-c("John","John","John","John","John","John","Melanie","Melanie","Melanie","Melanie","Melanie","Melanie") 
Client<-c("Chris","Chris","Chris","Tom","Tom","Tom","Valerie","Valerie","Valerie","Tim","Tim","Tim") 
Task<-c("Feed cat","Make dinner","Iron shirt","Make dinner","Do homework","Make lunch","Make dinner","Feed cat","Buy groceries","Do homework","Iron shirt","Make lunch") 
Task2<-c("Feed cat","Make dinner","Iron shirt","Make dinner","Do homework","Feed cat","Make dinner","Feed cat","Iron shirt","Do homework","Iron shirt","Make lunch") 
Df<-data.frame(CaseWorker,Client,Task,Task2) 

回答

2

看看這是你在做什麼。

首先,看看Task是否匹配Task2。如果不是,則將Task2作爲新變量返回。我這個存儲到一個新的數據幀df2

df2 <- Df %>% 
    mutate(match = Task == Task2, 
      non_match = ifelse(!match, Task2, "")) 
df2 

# CaseWorker Client   Task  Task2 match non_match 
# 1  John Chris  Feed cat Feed cat TRUE   
# 2  John Chris Make dinner Make dinner TRUE   
# 3  John Chris Iron shirt Iron shirt TRUE   
# 4  John  Tom Make dinner Make dinner TRUE   
# 5  John  Tom Do homework Do homework TRUE   
# 6  John  Tom Make lunch Feed cat FALSE Feed cat 
# 7  Melanie Valerie Make dinner Make dinner TRUE   
# 8  Melanie Valerie  Feed cat Feed cat TRUE   
# 9  Melanie Valerie Buy groceries Iron shirt FALSE Iron shirt 
# 10 Melanie  Tim Do homework Do homework TRUE   
# 11 Melanie  Tim Iron shirt Iron shirt TRUE   
# 12 Melanie  Tim Make lunch Make lunch TRUE   

然後summarise的結果,看看個別CaseWorker /Client雙匹配的所有條目。

df2 %>% 
    group_by(CaseWorker, Client) %>% 
    summarise(n = n(), 
      matches = sum(match), 
      all_match = n == matches) 

# CaseWorker Client  n matches all_match 
#  <chr> <chr> <int> <int>  <lgl> 
# 1  John Chris  3  3  TRUE 
# 2  John  Tom  3  2  FALSE 
# 3 Melanie  Tim  3  3  TRUE 
# 4 Melanie Valerie  3  2  FALSE 

,如果你需要在你的原始數據集的all_match變量你可以的話當然合併此回你的數據幀。

+0

感謝您的回答!我發佈了這個問題的「第二部分」,如果您有興趣也可以發佈一個更復雜但相似的問題。它以相同的問題名稱發佈,但在開始時使用「第2部分」。 – Mike

1

您可以簡單地通過dplyr做到這一點,利用%in%

Df %>% 
    group_by(CaseWorker,Client) %>% 
    mutate(Check = Task %in% Task2) 

這取決於精確匹配的情況下,如果你擔心,你可以在以下幾點:

Df %>% 
    group_by(CaseWorker,Client) %>% 
    rowwise() %>% 
    mutate(Check = grepl(Task, Task2, ignore.case = TRUE)) 

但是你必須在mutate之前使用rowwise來解決grepl的向量化特性(或者大多數R函數)

0

如果你想使用stringr包。下面也可以爲你工作。

Df %>% 
    group_by(CaseWorker,Client) %>% 
    mutate(Check=str_detect(as.character(Task),as.character(Task2)) 
0

這可能只是我的誤解問題,但我想你可能是過於複雜的情況下,本就是你想要的僅僅是其中任務不匹配任務2的記錄。

> Df[which(Df$Task != Df$Task2),] 

=== ========== ======= ============= ========== 
\ CaseWorker Client Task   Task2  
=== ========== ======= ============= ========== 
6 John  Tom  Make lunch  Feed cat 
9 Melanie  Valerie Buy groceries Iron shirt 
=== ========== ======= ============= ========== 
相關問題