2017-09-17 53 views
0

我想要得到一個表達式,需要大量的幾個段落,並且在該行中找到具有兩個特定單詞的行,所以我正在查找AND運算符?任何方式如何做到這一點?在正則表達式中的R AND運算符

例如:

c <- ("She sold seashells by the seashore, and she had a great time while doing so.") 

我想要找到一個符合兩個「賣出」,並在該行的「偉大」的表達式。

我已經試過類似:

grep("sold", "great", c, value = TRUE) 

任何想法?

非常感謝!

回答

2

您可以創建兩個捕獲組,假設的單詞的順序是不重要的

grep("(sold|great)(?:.+)(sold|great)", c, value = TRUE) 
+0

謝謝,但我實際上正在尋找一個包含兩個,而不是任何單詞的行。如果線路已售出但不是很好,我不希望線路被退回。 – intern14

+0

@ intern14,道歉,我誤解了。看到我上面的編輯。 –

0

重複的信息可能讓你開始,但我不認爲直接地址,你的問題。

您可以用all

pos <- ("She sold seashells by the seashore, and she had a great time while doing so.") # contains sold and great 
neg <- ("She bought seashells by the seashore, and she had a great time while doing so.") # contains great 

pattern <- c("sold", "great") 

library(stringr) 
all(str_detect(pos,pattern)) 
# [1] TRUE 

all(str_detect(neg,pattern)) 
# [1] FALSE 

stringr::detect想要搜索的模式

0

一個特徵向量的結合stringr::str_detect有個好處(超過grepl)雖然在大多數情況下,我會stringr封裝已去建議在CPak的回答中,也有我grep的解決辦法:

# create the sample string 
c <- ("She sold seashells by the seashore, and she had a great time while doing so.") 

# match any sold and great string within the text 
# ignore case so that Sold and Great are also matched 
grep("(sold.*great|great.*sold)", c, value = TRUE, ignore.case = TRUE) 

嗯,不錯,對吧?但是如果有一個詞只含有短語soldgreat

# set up alternative string 
d <- ("She saw soldier eating seashells by the seashore, and she had a great time while doing so.") 
# even soldier is matched here: 
grep("(sold.*great|great.*sold)", d, value = TRUE, ignore.case = TRUE) 

所以,你可能想使用單詞邊界,也就是整個單詞匹配:

# \\b is a special character which matches word endings 
grep("(\\bsold\\b.*\\bgreat\\b|\\bgreat\\b.*\\bsold\\b)", d, value = TRUE, ignore.case = TRUE) 

\\b匹配字符串的第一個字符,最後一個字符的字符串或其中一個屬於兩個字符之間一個字和另一個沒有:

更多關於\b元字符這裏: