提取電話號碼並重新格式化的更好方法？在各種格式

電話號碼的數據（我選擇這些，因爲數據進來是不可靠的，而不是預期的格式）：提取電話號碼並重新格式化的更好方法？在各種格式

+1 480-874-4666 
404-581-4000 
(805) 682-4726 
978-851-7321, Ext 2606 
413- 658-1100 
(513) 287-7000,Toll Free (800) 733-2077 
1 (813) 274-8130 
212-363-3200,Media Relations: 212-668-2251. 
323/221-2164

我的Ruby代碼提取所有的數字，刪除任何領先的1對的美國國家代碼，然後用前10位中所需的格式以創建「新」的電話號碼：

nums = phone_number_string.scan(/[0-9]+/) 
    if nums.size > 0 
    all_nums = nums.join 
    all_nums = all_nums[0..0] == "1" ? all_nums[1..-1] : all_nums 
    if all_nums.size >= 10 
     ten_nums = all_nums[0..9] 
     final_phone = "#{ten_nums[0..2]}-#{ten_nums[3..5]}-#{ten_nums[6..9]}" 
    else 
     final_phone = "" 
    end 
    puts "#{final_phone}" 
    else 
    puts "No number to fix." 
    end

的結果是很好！

480-874-4666 
404-581-4000 
805-682-4726 
978-851-7321 
413-658-1100 
513-287-7000 
813-274-8130 
212-363-3200 
323-221-2164

但是，我認爲還有更好的辦法。你可以重構這個更高效，更清晰或更有用嗎？

來源

2009-07-17 Kevin Elliott

+ 49- 2345-123456789 – Svante 2009-07-17 10:28:40

這裏有一個更簡單的方法只使用正則表達式和替代：

def extract_phone_number(input) 
    if input.gsub(/\D/, "").match(/^1?(\d{3})(\d{3})(\d{4})/) 
    [$1, $2, $3].join("-") 
    end 
end

這條所有非數字（\D），跳過一個可選的主導一個（^1?），然後提取以塊的第一剩餘的10個位數（(\d{3})(\d{3})(\d{4})）和格式。

這裏的測試：

test_data = { 
    "+1 480-874-4666"        => "480-874-4666", 
    "404-581-4000"        => "404-581-4000", 
    "(805) 682-4726"        => "805-682-4726", 
    "978-851-7321, Ext 2606"      => "978-851-7321", 
    "413- 658-1100"        => "413-658-1100", 
    "(513) 287-7000,Toll Free (800) 733-2077"  => "513-287-7000", 
    "1 (813) 274-8130"       => "813-274-8130", 
    "212-363-3200,Media Relations: 212-668-2251." => "212-363-3200", 
    "323/221-2164"        => "323-221-2164", 
    ""           => nil, 
    "foobar"          => nil, 
    "1234567"          => nil, 
} 

test_data.each do |input, expected_output| 
    extracted = extract_phone_number(input) 
    print "FAIL (expected #{expected_output}): " unless extracted == expected_output 
    puts extracted 
end

來源

2009-07-17 14:21:59

這種方法可能更快。 – 2009-07-17 14:24:33

對於北美計劃一個可以提取使用phone_number_string.gsub(/\D/, '').match(/^1?(\d{10})/)[1]

例如，第一個數字號碼：

test_phone_numbers = ["+1 480-874-4666", 
         "404-581-4000", 
         "(805) 682-4726", 
         "978-851-7321, Ext 2606", 
         "413- 658-1100", 
         "(513) 287-7000,Toll Free (800) 733-2077", 
         "1 (813) 274-8130", 
         "212-363-3200,Media Relations: 212-668-2251.", 
         "323/221-2164", 
         "foobar"] 

test_phone_numbers.each do | phone_number_string | 
    match = phone_number_string.gsub(/\D/, '').match(/^1?(\d{10})/) 
    puts(
    if (match) 
     "#{match[1][0..2]}-#{match[1][3..5]}-#{match[1][6..9]}" 
    else 
     "No number to fix." 
    end 
) 
end

與起始代碼，這不捕捉多個號碼，例如，「（513）287-7000，免費電話（800）733-2077」

FWIW，我發現從長遠來看，它更容易捕獲和存儲完整的數字，即包括國家代碼和沒有分隔符;在拍攝期間進行猜測，其中numbering plan人丟失前綴，並且在渲染時選擇格式，例如NANP v.DE。

來源

2009-07-17 11:54:30 ylg

我的做法是有點不同的（我認爲更好恕我直言:-)：我需要不會錯過任何電話號碼，即使有一行是2。我也不想讓有3組數字的線路相距甚遠（請參閱cookie示例），我也不想將IP地址誤認爲電話號碼。

代碼，允許每行多個號碼，還需要設置的數字是「關閉」對方：

def extract_phone_number(input) 
    result = input.scan(/(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/).map{|e| e.join('-')} 
    # <result> is an Array of whatever phone numbers were extracted, and the remapping 
    # takes care of cleaning up each number in the Array into a format of 800-432-1234 
    result = result.join(' :: ') 
    # <result> is now a String, with the numbers separated by ' :: ' 
    # ... or there is another way to do it (see text below the code) that only gets the 
    # first phone number only. 

    # Details of the Regular Expressions and what they're doing 
    # 1. (\d{3}) -- get 3 digits (and keep them) 
    # 2. \D{0,3} -- allow skipping of up to 3 non-digits. This handles hyphens, parentheses, periods, etc. 
    # 3. (\d{3}) -- get 3 more digits (and keep them) 
    # 4. \D{0,3} -- skip up to 0-3 non-digits 
    # 5. (\d{4}) -- keep the final 4 digits 

    result.empty? ? nil : result 
end

這裏是測試（有一些額外的測試）

test_data = { 
    "DB=Sequel('postgres://user:[email protected]/test_test')" => nil, # DON'T MISTAKE IP ADDRESSES AS PHONE NUMBERS 
    "100 cookies + 950 cookes = 1050 cookies"  => nil, # THIS IS NEW 
    "this 123 is a 456 bad number 7890"   => nil, # THIS IS NEW 
    "212-363-3200,Media Relations: 212-668-2251." => "212-363-3200 :: 212-668-2251", # THIS IS CHANGED 
    "this is +1 480-874-4666"      => "480-874-4666", 
    "something 404-581-4000"      => "404-581-4000", 
    "other (805) 682-4726"      => "805-682-4726", 
    "978-851-7321, Ext 2606"      => "978-851-7321", 
    "413- 658-1100"        => "413-658-1100", 
    "(513) 287-7000,Toll Free (800) 733-2077"  => "513-287-7000 :: 800-733-2077", # THIS IS CHANGED 
    "1 (813) 274-8130"       => "813-274-8130", 
    "323/221-2164"        => "323-221-2164", 
    ""           => nil, 
    "foobar"          => nil, 
    "1234567"          => nil, 
} 

def test_it(test_data) 
    test_data.each do |input, expected_output| 
    extracted = extract_phone_number(input) 
    puts "#{extracted == expected_output ? 'good': 'BAD!'} ::#{input} => #{extracted.inspect}" 
    end 
end 

test_it(test_data)

替代實現：通過使用「掃描」它會自動應用正則表達式多次，如果您想每行超過1個電話號碼，這是很好的。如果你只是想獲得一個行的第一個電話號碼，那麼你還可以使用：

first_phone_number = begin 
    m = /(\d{3})\D{0,3}(\d{3})\D{0,3}(\d{4})/.match(input) 
    [m[1],m[2],m[3]].join('-') 
rescue nil; end

（只是一個不同的做事方式，使用正則表達式的「匹配」功能）

來源

2010-01-02 06:11:12

提取電話號碼並重新格式化的更好方法？在各種格式

回答

相關問題