2014-03-12 13 views
2

我試圖掃描一個字符串的單詞列表的任何組合。具體而言,我想找到任何'數字'組合,如「二百八十」或「五十八」。如何使用正則表達式在ruby中掃描單詞的組合?

要做到這一點我做了一個列表中的所有單字數達百萬:

numberWords = ["one", "two", "three", ...... "hundred", "thousand", "million"] 

我又加入了名單一起使用「|」並提出這樣的正則表達式:

string.scan(/\b(#{wordList}(\s|\.|,|\?|\!))+/) 

我預計返回所有數字組合的列表,但它只有單獨返回的話。例如,如果字符串中有「三百萬」,則返回「三」和「百萬」,但不是「三百萬」。我該如何糾正?

+0

爲什麼你沒有'和'在列表中? – sawa

+0

@sawa對不起,我不明白。我應該在哪裏放置「和」 –

+0

要麼在列表中,要麼在正則表達式中。沒有放過它,你希望誰能抓住'二百八十''? – sawa

回答

7
numberWords = ["one", "two", "three", "hundred", "thousand", "million"] 
numberWords = Regexp.union(numberWords) 
# => /one|two|three|hundred|thousand|million/ 

"foo bar three million dollars" 
.scan(/\b#{numberWords}(?:(?:\s+and\s+|\s+)#{numberWords})*\b/) 
# => ["three million"] 
+1

謝謝,它效果很好。你對「和」也是對的。 –

2

只是爲了好玩,這裏的生成必須匹配長的列表模式的更有趣的方法:

#!/usr/bin/env perl 

use Regexp::Assemble; 

my $ra = Regexp::Assemble->new; 
foreach (@ARGV) { 
    $ra->add($_); 
} 
print $ra->re, "\n"; 

保存,作爲「regexp_assemble.pl」,安裝Perl的Regexp::Assemble模塊,然後運行:

perl ./regexp_assemble.pl one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty thirty forty fifty sixty seventy eighty ninety hundred thousand million ' ' '\.' ',' '?' '!' 

你應該可以看到這個生成:

(?^:(?:[ !,.?]|t(?:h(?:irt(?:een|y)|ousand|ree)|w(?:e(?:lve|nty)|o)|en)|f(?:o(?:ur(?:teen)?|rty)|i(?:ft(?:een|y)|ve))|s(?:even(?:t(?:een|y))?|ix(?:t(?:een|y))?)|e(?:ight(?:een|y)?|leven)|nine(?:t(?:een|y))?|hundred|million|one)) 

這是Perl的版本模式,它需要一些小的調整,以滿足您的要求:刪除前導?^:及其周邊括號,加尾+和靈活性,使其不區分大小寫:

pattern = /(?:[ !,.?]|t(?:h(?:irt(?:een|y)|ousand|ree)|w(?:e(?:lve|nty)|o)|en)|f(?:o(?:ur(?:teen)?|rty)|i(?:ft(?:een|y)|ve))|s(?:even(?:t(?:een|y))?|ix(?:t(?:een|y))?)|e(?:ight(?:een|y)?|leven)|nine(?:t(?:een|y))?|hundred|million|one)+/i 

下面是一些scan結果:

'one dollar'.scan(pattern) # => ["one "] 
'one million dollars'.scan(pattern) # => ["one million "] 
'one million three hundred dollars'.scan(pattern) # => ["one million three hundred "] 
'one million, three hundred!'.scan(pattern) # => ["one million, three hundred!"] 
'one million, three hundred and one dollars'.scan(pattern) # => ["one million, three hundred ", " one "] 

不幸的是,Ruby並不等同於Perl的Regexp::Assemble模塊。這對於這類任務非常有用,因爲Ruby中的正則表達式引擎速度非常快。

的唯一的缺點是它的拍攝前後的空格,但是這很容易通過在字符串中使用map(&:strip)固定:

'one million, three hundred and one dollars'.scan(pattern).map(&:strip) # => ["one million, three hundred", "one"] 
相關問題