2
我正在通過按零售商名稱對採購進行分組來分析我的銀行對帳單,然後可以使用dplyr
函數分析生成的數據框。我的方法使用自定義函數,但我很想知道是否有更高效的方法。例如,是否有任何包可以使用數據幀列之間的複雜匹配邏輯來連接數據框?R銀行對帳單分組
debug(FindRetailer)
FindRetailer<-function(Purchase){
P <- toupper(Purchase)
for(z in 1:length(RetailerNames)){
Retailer<-toupper(RetailerNames[z])
HasFound=grepl(Retailer,P)
if(HasFound==TRUE){
return(str_to_title(Retailer))
}
}
return("Donno")
}
Statement <- data.frame(
Purchase = c("abc Aldi xyz","a Kmart bcd","a STARBUCKS ghju","abcd MacD efg"),
Amount = c(235,23,789,45))
RetailerNames<- c("Aldi","Kmart","Starbucks","MacD")
# what I need
Result <- data.frame(
Purchase = c("abc Aldi xyz","a KMART bcd","a STARBUCKS mmm","abcd MACD efg"),
Amount = c(235,23,789,45),
Retailer = c("Aldi","Kmart","Starbucks","Macd"))
# this works using custom function
NewStatment<-Statement %>%
rowwise() %>%
mutate(Retailer=FindRetailer(Purchase))
# is this possible: join dataframes using complex string matching?
# this doesn't work yet
TestMethod<-Statement %>%
left_join(RetailerNames,by="Statement.Purchase %in% RetailerNames")
謝謝,我以爲會有一個簡單的解決方案。我會看看''fuzzyjoin''也 – Zeus
我編輯的解決方案,因爲我的原始只是因爲幸運的巧合。我目前的解決方案涉及將零售商名稱向量摺疊爲正則表達式字符串 – yeedle
感謝您的糾正和模糊邏輯方法 – Zeus