2017-03-09 76 views
1

我正嘗試使用agrep命令進行模糊匹配。我有一個數據框,其中一列包含觀衆響應和另一個數據框,其中列出了段和子段。列受衆響應包含作爲子段名稱的單詞。例如:R中的模糊映射

pattern$audience 
[1] "(Deleted) Semasio » DE: Intent » Christmas Shopping"   
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"  
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"   
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers" 
[5] "(Old) AddThis - UK » Food » Social"       
[6] "(Old) AddThis - UK » Health » Social » Health Influencers" 

同樣我有另一個數據幀稱爲X那conatins

x$segment    x$subsegment 
Shopping    Financial shoppers 
Travel     Travel Europe 
Shopping    Christmas shopping 

我想要寫,做圖案$顧客和X之間的模糊匹配的功能段和子段$子片段,並返回子片段爲每一個新的列模式$ subseg觀衆反應的

由此產生的數據集,我需要應該是這樣的:

pattern$audience x$segment    x$subsegment     
[1] "(Deleted) Semasio » DE: Intent » Christmas C"   Shopping    Christmas shopping    
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"       
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"       
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers" Shopping    Financial shoppers    
[5] "(Old) AddThis - UK » Food » Social"            
[6] "(Old) AddThis - UK » Health » Social » Health Influencers"     

下面是我試着寫代碼,但它沒有返回我所需的輸出:

x <- rename(x, c("Segment" = "segment", "Sub Segment" = "subseg")) 
names(x) 
y <- as.data.frame(x$subseg) 
y <- rename(y, c("x$subseg" = "subseg")) 


n.match <- function(pattern, x, ...) { 
    for (i in 1:nrow(pattern)) { 
     x <- (agrep(y,pattern$audience[i], 
       ignore.case=TRUE, value = TRUE)) 
       x <- paste0(x,"") 
       pattern$subseg[i] <- x 
    } 
    head(pattern) 
    } 

有人可以幫我改正我的錯誤。 我真的很感激你的答案。 非常感謝

回答

0

我們可以試試這個:

pattern <- c("(Deleted) Semasio » DE: Intent » Christmas C",   
     "(Old) AddThis - UK » Auto » General » Auto Enthusiasts", 
     "(Old) AddThis - UK » Auto » General » Auto Intenders",   
     "(Old) AddThis - UK » Financial » Social » Financial Shoppers", 
     "(Old) AddThis - UK » Food » Social", 
     "(Old) AddThis - UK » Financial » Social » Financial Shoppers", 
     "(Old) AddThis - UK » Health » Social » Health Influencers") 
pattern <- data.frame(audiance=pattern) 
x <- read.csv(text='segment, subsegment  
         Shopping, Financial shoppers 
         Travel,  Travel Europe 
         Enthusiasts, Auto Enthusiasts 
         Shopping, Christmas shopping', stringsAsFactors=FALSE) 

vagrep <- Vectorize(agrep, 'pattern', SIMPLIFY = TRUE) 
pattern$subsegment <- '' 
matches <- vagrep(x$subsegment, pattern$audiance) 
invisible(lapply(1:length(matches), function(i) if (length(matches[[i]] > 0)) pattern$subsegment[matches[[i]]] <<- x$subsegment[i])) 

pattern 
#               audiance   subsegment 
#1     (Deleted) Semasio » DE: Intent » Christmas C      
#2  (Old) AddThis - UK » Auto » General » Auto Enthusiasts Auto Enthusiasts 
#3   (Old) AddThis - UK » Auto » General » Auto Intenders      
#4 (Old) AddThis - UK » Financial » Social » Financial Shoppers Financial shoppers 
#5       (Old) AddThis - UK » Food » Social      
#6 (Old) AddThis - UK » Financial » Social » Financial Shoppers Financial shoppers 
#7 (Old) AddThis - UK » Health » Social » Health Influencers      
+1

謝謝你這麼多......該解決方案爲我工作完美。 – Shaz

+1

完成....非常感謝 – Shaz

+0

非常感謝 –