如何使用正則表達式在ruby中掃描單詞的組合？

我試圖掃描一個字符串的單詞列表的任何組合。具體而言，我想找到任何'數字'組合，如「二百八十」或「五十八」。如何使用正則表達式在ruby中掃描單詞的組合？

要做到這一點我做了一個列表中的所有單字數達百萬：

numberWords = ["one", "two", "three", ...... "hundred", "thousand", "million"]

我又加入了名單一起使用「|」並提出這樣的正則表達式：

string.scan(/\b(#{wordList}(\s|\.|,|\?|\!))+/)

我預計返回所有數字組合的列表，但它只有單獨返回的話。例如，如果字符串中有「三百萬」，則返回「三」和「百萬」，但不是「三百萬」。我該如何糾正？

來源

2014-03-12 Rory A Campbell

爲什麼你沒有'和'在列表中？ – sawa

@sawa對不起，我不明白。我應該在哪裏放置「和」 –

要麼在列表中，要麼在正則表達式中。沒有放過它，你希望誰能抓住'二百八十''？ – sawa

numberWords = ["one", "two", "three", "hundred", "thousand", "million"] 
numberWords = Regexp.union(numberWords) 
# => /one|two|three|hundred|thousand|million/ 

"foo bar three million dollars" 
.scan(/\b#{numberWords}(?:(?:\s+and\s+|\s+)#{numberWords})*\b/) 
# => ["three million"]

來源

2014-03-12 14:11:24 sawa

謝謝，它效果很好。你對「和」也是對的。 –

只是爲了好玩，這裏的生成必須匹配長的列表模式的更有趣的方法：

#!/usr/bin/env perl 

use Regexp::Assemble; 

my $ra = Regexp::Assemble->new; 
foreach (@ARGV) { 
    $ra->add($_); 
} 
print $ra->re, "\n";

保存，作爲「regexp_assemble.pl」，安裝Perl的Regexp::Assemble模塊，然後運行：

perl ./regexp_assemble.pl one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty thirty forty fifty sixty seventy eighty ninety hundred thousand million ' ' '\.' ',' '?' '!'

你應該可以看到這個生成：

(?^:(?:[ !,.?]|t(?:h(?:irt(?:een|y)|ousand|ree)|w(?:e(?:lve|nty)|o)|en)|f(?:o(?:ur(?:teen)?|rty)|i(?:ft(?:een|y)|ve))|s(?:even(?:t(?:een|y))?|ix(?:t(?:een|y))?)|e(?:ight(?:een|y)?|leven)|nine(?:t(?:een|y))?|hundred|million|one))

這是Perl的版本模式，它需要一些小的調整，以滿足您的要求：刪除前導?^:及其周邊括號，加尾+和靈活性，使其不區分大小寫：

pattern = /(?:[ !,.?]|t(?:h(?:irt(?:een|y)|ousand|ree)|w(?:e(?:lve|nty)|o)|en)|f(?:o(?:ur(?:teen)?|rty)|i(?:ft(?:een|y)|ve))|s(?:even(?:t(?:een|y))?|ix(?:t(?:een|y))?)|e(?:ight(?:een|y)?|leven)|nine(?:t(?:een|y))?|hundred|million|one)+/i

下面是一些scan結果：

'one dollar'.scan(pattern) # => ["one "] 
'one million dollars'.scan(pattern) # => ["one million "] 
'one million three hundred dollars'.scan(pattern) # => ["one million three hundred "] 
'one million, three hundred!'.scan(pattern) # => ["one million, three hundred!"] 
'one million, three hundred and one dollars'.scan(pattern) # => ["one million, three hundred ", " one "]

不幸的是，Ruby並不等同於Perl的Regexp::Assemble模塊。這對於這類任務非常有用，因爲Ruby中的正則表達式引擎速度非常快。

的唯一的缺點是它的拍攝前後的空格，但是這很容易通過在字符串中使用map(&:strip)固定：

'one million, three hundred and one dollars'.scan(pattern).map(&:strip) # => ["one million, three hundred", "one"]

來源

2014-03-12 17:44:19

我已經移植Perl的正則表達式::特里到Ruby：

https://github.com/gfx/ruby-regexp_trie

這是正則表達式的簡單版本::組裝但對我來說不夠好。

來源

2016-01-22 13:54:30

如何使用正則表達式在ruby中掃描單詞的組合？

回答

相關問題