從R中的字符串中提取圖案，不區分大寫和小寫字母

這是一個玩具的例子。我想在a內搜索並提取b中列出的那些顏色。即使顏色不是以大寫字母開頭，我也想提取它。但是，輸出應告訴我在a中如何使用顏色。從R中的字符串中提取圖案，不區分大寫和小寫字母

所以我想得到的答案是#"Red" NA "blue。

a <- "She has Red hair and blue eyes" 
b <- c("Red", "Yellow", "Blue") 
str_extract(a, b)#"Red" NA NA

我用str_extract從 'stringr'，但很樂意使用另一個函數/包（例如，grep）。

來源

2016-06-14 milan

將所有字符串轉換爲同一個案例最容易，請參閱函數tolower或toupper。 – Dave2e

我們可以做到這一點base R

unlist(sapply(tolower(b), function(x) { 
     x1 <- regmatches(a, gregexpr(x, tolower(a))) 
     replace(x1, x1 == "character(0)", NA)}), use.names=FALSE) 
# "Red"  NA "blue"

或者從@ leerssej的回答爲靈感

library(stringr) 
str_extract(a, fixed(b, ignore_case=TRUE)) 
#[1] "Red" NA  "blue"

來源

2016-06-14 03:21:43 akrun

除非我誤解了，你的解決方案與我第一次嘗試時有同樣的問題... OP希望結果保持大寫字母不變'a' –

@DominicComtois也在這裏修正！ – akrun

stringr有ignore.case()功能

str_extract(a, ignore.case(b))#"Red" NA  "blue"

來源

2016-06-14 03:22:40

謝謝。這樣做會導致錯誤信息：請使用（fixed | coll | regexp）（x，ignore_case = TRUE）而不是ignore.case（x）;所以;也許我應該這樣做：str_extract（fixed（a，ignore_case = TRUE），fixed（b，ignore_case = TRUE））？ – milan

@milan這不是一個錯誤。這只是一個消息，甚至不是一個警告。該代碼提供了正確的結果。但是你可以使用'str_extract（a，（fixed）（b，ignore_case = TRUE））'。 – RHertel

正如我所看到的，我上次評論中的建議已經包含在@ akrun的編輯中。 – RHertel

作爲細化akrun的答案，您可以使用更改o ˚F情況下匹配，但仍然會返回一個元素它們原文爲a方式：

library(stringr) 
a <- "She has Red hair and blue eyes" 
b <- c("Red", "Yellow", "Blue") 

positions <- str_locate(toupper(a), toupper(b)) 
apply(positions, 1, function(x) substr(a,x[1],x[2])) 

## [1] "Red" NA "blue"

或者消除NA ...

positions <- str_locate(toupper(a), toupper(b)) 
words <- apply(positions, 1, function(x) substr(a,x[1],x[2])) 
words[!is.na(words)] 

## [1] "Red" "blue"

來源

2016-06-14 03:31:00

現在已經修復！ –

隨着stringi一個可以使用不區分大小寫選項

library(stringi) 
stri_extract_all_fixed(a, b, opts_fixed = list(case_insensitive = TRUE)) 
#[[1]] 
#[1] "Red" 
#[[2]] 
#[1] NA 
#[[3]] 
#[1] "blue" 


# or using simplify = TRUE to get a non-list output 
stri_extract_all_fixed(a, b, opts_fixed = list(case_insensitive = TRUE), 
    simplify = TRUE) 
#  [,1] 
#[1,] "Red" 
#[2,] NA  
#[3,] "blue"

來源

2016-06-14 03:58:25 Jota

從R中的字符串中提取圖案，不區分大寫和小寫字母

回答

相關問題