我的做法是有點不同的(我認爲更好恕我直言:-):我需要不會錯過任何電話號碼,即使有一行是2。我也不想讓有3組數字的線路相距甚遠(請參閱cookie示例),我也不想將IP地址誤認爲電話號碼。
代碼,允許每行多個號碼,還需要設置的數字是「關閉」對方:
def extract_phone_number(input)
result = input.scan(/(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/).map{|e| e.join('-')}
# <result> is an Array of whatever phone numbers were extracted, and the remapping
# takes care of cleaning up each number in the Array into a format of 800-432-1234
result = result.join(' :: ')
# <result> is now a String, with the numbers separated by ' :: '
# ... or there is another way to do it (see text below the code) that only gets the
# first phone number only.
# Details of the Regular Expressions and what they're doing
# 1. (\d{3}) -- get 3 digits (and keep them)
# 2. \D{0,3} -- allow skipping of up to 3 non-digits. This handles hyphens, parentheses, periods, etc.
# 3. (\d{3}) -- get 3 more digits (and keep them)
# 4. \D{0,3} -- skip up to 0-3 non-digits
# 5. (\d{4}) -- keep the final 4 digits
result.empty? ? nil : result
end
這裏是測試(有一些額外的測試)
test_data = {
"DB=Sequel('postgres://user:[email protected]/test_test')" => nil, # DON'T MISTAKE IP ADDRESSES AS PHONE NUMBERS
"100 cookies + 950 cookes = 1050 cookies" => nil, # THIS IS NEW
"this 123 is a 456 bad number 7890" => nil, # THIS IS NEW
"212-363-3200,Media Relations: 212-668-2251." => "212-363-3200 :: 212-668-2251", # THIS IS CHANGED
"this is +1 480-874-4666" => "480-874-4666",
"something 404-581-4000" => "404-581-4000",
"other (805) 682-4726" => "805-682-4726",
"978-851-7321, Ext 2606" => "978-851-7321",
"413- 658-1100" => "413-658-1100",
"(513) 287-7000,Toll Free (800) 733-2077" => "513-287-7000 :: 800-733-2077", # THIS IS CHANGED
"1 (813) 274-8130" => "813-274-8130",
"323/221-2164" => "323-221-2164",
"" => nil,
"foobar" => nil,
"1234567" => nil,
}
def test_it(test_data)
test_data.each do |input, expected_output|
extracted = extract_phone_number(input)
puts "#{extracted == expected_output ? 'good': 'BAD!'} ::#{input} => #{extracted.inspect}"
end
end
test_it(test_data)
替代實現:通過使用「掃描」它會自動應用正則表達式多次,如果您想每行超過1個電話號碼,這是很好的。如果你只是想獲得一個行的第一個電話號碼,那麼你還可以使用:
first_phone_number = begin
m = /(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/.match(input)
[m[1],m[2],m[3]].join('-')
rescue nil; end
(只是一個不同的做事方式,使用正則表達式的「匹配」功能)
+ 49- 2345-123456789 – Svante 2009-07-17 10:28:40