正則表達式只返回一個匹配

我有一組關鍵字。任何關鍵字都可以包含空格符號['one', 'one two']。我從這些kyewords生成一個正則表達式，如/\b(?i:one|one\ two|three)\b/。下面完整的例子：正則表達式只返回一個匹配

keywords = ['one', 'one two', 'three'] 
re = /\b(?i:#{ Regexp.union(keywords).source })\b/ 
text = 'Some word one and one two other word' 
text.downcase.scan(re)

這段代碼的結果是

=> ["one", "one"]

如何找到第二個關鍵字one two的比賽，並得到結果也是這樣嗎？

=> ["one", "one two"]

來源

2017-01-30 Edward

更改從最長到最短的變化順序。 – revo

的正則表達式都渴望相匹配。一旦他們找到了匹配，他們就不會嘗試找到另一個可能更長的匹配（有一個重要的例外）。

/\b(?i:one|one\ two|three)\b/永遠不會匹配one two，因爲它總是首先匹配one。你需要/\b(?i:one two|one|three)\b/，所以它首先嚐試one two。自動化最簡單的方法可能是首先排序最長的關鍵字。

keywords = ['one', 'one two', 'three'] 
re = Regexp.union(keywords.sort { |a,b| b.length <=> a.length }).source 
re = /\b#{re}\b/i; 
text = 'Some word one and one two other word' 
puts text.scan(re)

注意，我把整個正則表達式不區分大小寫，更容易比(?:...)閱讀，而且downcasing字符串是多餘的。

唯一的例外是repetition像+，*和朋友。它們默認爲貪婪。 .+將會匹配儘可能多的字符。這很貪心。你可以使它懶惰，以匹配它所看到的第一件事，與?。 .+?將匹配單個字符。

"A foot of fools".match(/(.*foo)/); # matches "A foot of foo" 
"A foot of fools".match(/(.*?foo)/); # matches "A foo"

來源

2017-01-30 19:16:39 Schwern

我想你的例子由第一元素移動到該陣列的第二位置，並且它的工作原理（例如http://rubular.com/r/4F2Hc46wHT）。

事實上，它看起來像第一個關鍵字「重疊」第二個。

如果您無法更改關鍵字順序，則此響應可能無益。

來源

2017-01-30 19:10:39

的一點是，\bone\bone在one two以來出現該分支前one two分支的比賽，爲「贏」（見Remember That The Regex Engine Is Eager）。

您需要在構建正則表達式之前按降序對關鍵字數組進行排序。然後，它看起來像

(?-mix:\b(?i:three|one\ two|one)\b)

這樣的時間越長one two將是前短one和將得到匹配。

見Ruby demo：

keywords = ['one', 'one two', 'three'] 
keywords = keywords.dup.sort.reverse 
re = /\b(?i:#{ Regexp.union(keywords).source })\b/ 
text = 'Some word one and one two other word' 
puts text.downcase.scan(re) 
# => [ one, one two ]

來源

2017-01-30 19:16:06

請注意，這是因爲'「AB」>「A」'不管「B」是什麼。[*如果字符串長度不同，並且字符串在比較長度最短時相等，則認爲較長的字符串大於較短的字符串*]（https://ruby-doc.org/core- 2.4.0/String.html＃方法-I-3C-3D-3E） – Schwern

正則表達式只返回一個匹配

回答

相關問題