2017-05-03 32 views
2

我正在通過按零售商名稱對採購進行分組來分析我的銀行對帳單,然後可以使用dplyr函數分析生成的數據框。我的方法使用自定義函數,但我很想知道是否有更高效的方法。例如,是否有任何包可以使用數據幀列之間的複雜匹配邏輯來連接數據框?R銀行對帳單分組

debug(FindRetailer) 

FindRetailer<-function(Purchase){ 
    P <- toupper(Purchase) 
    for(z in 1:length(RetailerNames)){ 
    Retailer<-toupper(RetailerNames[z]) 
    HasFound=grepl(Retailer,P) 
    if(HasFound==TRUE){ 
     return(str_to_title(Retailer)) 
    } 
    } 
    return("Donno") 
} 

Statement <- data.frame(
    Purchase = c("abc Aldi xyz","a Kmart bcd","a STARBUCKS ghju","abcd MacD efg"), 
    Amount = c(235,23,789,45)) 

RetailerNames<- c("Aldi","Kmart","Starbucks","MacD") 

# what I need 
Result <- data.frame(
    Purchase = c("abc Aldi xyz","a KMART bcd","a STARBUCKS mmm","abcd MACD efg"), 
    Amount = c(235,23,789,45), 
    Retailer = c("Aldi","Kmart","Starbucks","Macd")) 

# this works using custom function 
NewStatment<-Statement %>% 
    rowwise() %>% 
    mutate(Retailer=FindRetailer(Purchase)) 

# is this possible: join dataframes using complex string matching? 
# this doesn't work yet 
TestMethod<-Statement %>% 
    left_join(RetailerNames,by="Statement.Purchase %in% RetailerNames") 

回答

4


library(tidyverse) 
library(glue) 
Statement <- data.frame(
    Purchase = c("abc Aldi xyz","a Kmart bcd","a STARBUCKS ghju","abcd MacD efg"), 
    Amount = c(235,23,789,45)) 

RetailerNames<- c("Aldi","Kmart","Starbucks","MacD") 


Statement %>% 
    mutate(
    Retailer = Purchase %>% 
     str_extract(RetailerNames %>% collapse(sep ="|") %>% regex(ignore_case = T)) 
    ) 
#>   Purchase Amount Retailer 
#> 1  abc Aldi xyz 235  Aldi 
#> 2  a Kmart bcd  23  Kmart 
#> 3 a STARBUCKS ghju 789 STARBUCKS 
#> 4 abcd MacD efg  45  MacD 

如果你想要去的left_join路線,儘量

library(fuzzyjoin) 

RetailerNames<- data_frame(Retailer = c("Aldi","Kmart","Starbucks","MacD")) 

Statement %>% 
    regex_left_join(RetailerNames, by = c(Purchase="Retailer")) 
+0

謝謝,我以爲會有一個簡單的解決方案。我會看看''fuzzyjoin''也 – Zeus

+0

我編輯的解決方案,因爲我的原始只是因爲幸運的巧合。我目前的解決方案涉及將零售商名稱向量摺疊爲正則表達式字符串 – yeedle

+0

感謝您的糾正和模糊邏輯方法 – Zeus