R使用正則表達式，與多種模式

我想查找一些模式後的字符串。我的代碼似乎工作，但我無法完成這項工作。R使用正則表達式，與多種模式

這裏有一個例子：

pattern <- c("Iligan", "Cabeseria 25|Sta. Lucia", "Capitol", "Osmeña", 
"Nowhere", "Aglayan") 

# I want to match the string just after each pattern. For example I'm going to 
# match City just after Iligan. 

    target <-c("Iligan City", "Sta. Lucia, Ozamiz City", " Oroquieta City", 
      "Osmeña St. Dipolog City", "Lucia St., Zamboanga City", 
"Aglayan str, Oroquieta City", "Gingoog City", "Capitol br., Ozamiz City", 
"Dumaguete City", "Poblacion, Misamis") 

#The matches seems to work fine 
(matches <- sapply(pattern,FUN=function(x){regexpr(paste0(" 
(?<=\\b",x,"\\b ",")","[\\w-*\\.]*"),target,perl=T)})) 
print (matches) 

#But I cannot get the results. I would need use the column of each matrix 
#at a time 
villain <- lapply(matches,FUN = function(x)(regmatches(target,x)))

你有沒有解決這個問題。

unpdate 1

對於被精確這裏起見是所需的輸出。

results <- c("City", "St.", "br.") 

#[1] "City" "St." "br."

來源

2014-10-03 DJJ

預期產量是多少？只是匹配的字符串列表（沒有'NA's）？ – hrbrmstr 2014-10-03 11:11:41

什麼是unpdate？ – amonk 2017-07-19 12:20:37

裏有stringr包中的一些助手，可以簡化流程：

pattern <- c("Iligan", "Cabeseria 25|Sta. Lucia", "Capitol", "Osmeña", 
      "Nowhere", "Aglayan") 

target <-c("Iligan City", "Sta. Lucia, Ozamiz City", " Oroquieta City", 
      "Osmeña St. Dipolog City", "Lucia St., Zamboanga City", 
      "Aglayan str, Oroquieta City", "Gingoog City", "Capitol br., Ozamiz City", 
      "Dumaguete City", "Poblacion, Misamis") 


matchPat <- function(x) { 
    unlist(str_extract(target, perl(paste0("(?<=\\b", x, "\\b ",")","[\\w-*\\.]*")))) 
} 

matches <- sapply(pattern, matchPat) 

print(matches) 

##  Iligan Cabeseria 25|Sta. Lucia Capitol Osmeña Nowhere Aglayan 
## [1,] "City" NA      NA  NA  NA  NA  
## [2,] NA  NA      NA  NA  NA  NA  
## [3,] NA  NA      NA  NA  NA  NA  
## [4,] NA  NA      NA  "St." NA  NA  
## [5,] NA  NA      NA  NA  NA  NA  
## [6,] NA  NA      NA  NA  NA  "str" 
## [7,] NA  NA      NA  NA  NA  NA  
## [8,] NA  NA      "br." NA  NA  NA  
## [9,] NA  NA      NA  NA  NA  NA  
## [10,] NA  NA      NA  NA  NA  NA

這可以進一步簡化，如果你不需要非匹配指標，但沒有樣品/預期輸出提供。

來源

2014-10-03 11:16:19 hrbrmstr

R使用正則表達式，與多種模式

回答

相關問題