2011-03-05 64 views
2

我目前使用下面的解析電子郵件:的Ruby/Rails解析電子郵件

def parse_emails(emails) 
    valid_emails, invalid_emails = [], [] 
    unless emails.nil? 
     emails.split(/, ?/).each do |full_email| 
     unless full_email.blank? 
      if full_email.index(/\<.+\>/) 
      email = full_email.match(/\<.*\>/)[0].gsub(/[\<\>]/, "").strip 
      else 
      email = full_email.strip 
      end 
      email = email.delete("<").delete(">") 
      email_address = EmailVeracity::Address.new(email) 
      if email_address.valid? 
      valid_emails << email 
      else 
      invalid_emails << email 
      end 
     end 
     end      
    end 
    return valid_emails, invalid_emails 
    end 

給出一個電子郵件像我遇到的問題:

Bob Smith <[email protected]> 

上面的代碼刪除鮑勃史密斯和只有返回鮑勃@史密斯。

但我想要的是FNAME,LNAME,EMAIL的散列。 fname和lname是可選的,但電子郵件不是。

我將使用哪種類型的ruby對象,以及如何在上面的代碼中創建這樣的記錄?

感謝

回答

3

我編碼,這樣它會工作,即使你有一個像項:John Bob Smith Doe <[email protected]>

它將檢索:

{:email => "[email protected]", :fname => "John", :lname => "Bob Smith Doe" }

def parse_emails(emails) 
    valid_emails, invalid_emails = [], [] 
    unless emails.nil? 
    emails.split(/, ?/).each do |full_email| 
     unless full_email.blank? 
     if index = full_email.index(/\<.+\>/) 
      email = full_email.match(/\<.*\>/)[0].gsub(/[\<\>]/, "").strip 
      name = full_email[0..index-1].split(" ") 
      fname = name.first 
      lname = name[1..name.size] * " " 
     else 
      email = full_email.strip 
      #your choice, what the string could be... only mail, only name? 
     end 
     email = email.delete("<").delete(">") 
     email_address = EmailVeracity::Address.new(email) 

     if email_address.valid? 
      valid_emails << { :email => email, :lname => lname, :fname => fname} 
     else 
      invalid_emails << { :email => email, :lname => lname, :fname => fname} 
     end 
     end 
    end      
    end 
    return valid_emails, invalid_emails 
end 
+0

很好!謝謝 – AnApprentice 2011-03-06 01:13:03

+0

可以通過任何方式擴展此項以消除valid_emails中的重複電子郵件? – AnApprentice 2011-03-06 01:31:32

0

這裏有一個稍微不同的方法對我更好。它會抓取電子郵件地址之前或之後的名稱以及電子郵件地址是否在尖括號中。

我不嘗試從姓氏解析名字了 - 太有問題(如「瑪麗·安·史密斯」或瑪麗·史密斯博士「),但我不排除重複的電子郵件地址

def parse_list(list) 
    r = Regexp.new('[a-z0-9\.\_\%\+\-][email protected][a-z0-9\.\-]+\.[a-z]{2,4}', true) 
    valid_items, invalid_items = {}, [] 

    ## split the list on commas and/or newlines 
    list_items = list.split(/[,\n]+/) 

    list_items.each do |item| 
    if m = r.match(item) 
     ## get the email address 
     email = m[0] 
     ## get everything before the email address 
     before_str = item[0, m.begin(0)] 
     ## get everything after the email address 
     after_str = item[m.end(0), item.length] 
     ## enter the email as a valid_items hash key (eliminating dups) 
     ## make the value of that key anything before the email if it contains 
     ## any alphnumerics, stripping out any angle brackets 
     ## and leading/trailing space 
     if /\w/ =~ before_str 
     valid_items[email] = before_str.gsub(/[\<\>\"]+/, '').strip 
     ## if nothing before the email, make the value of that key anything after 
     ##the email, stripping out any angle brackets and leading/trailing space 
     elsif /\w/ =~ after_str 
     valid_items[email] = after_str.gsub(/[\<\>\"]+/, '').strip 
     ## if nothing after the email either, 
     ## make the value of that key an empty string 
     else 
     valid_items[email] = '' 
     end 
    else 
     invalid_items << item.strip if item.strip.length > 0 
    end 
    end 

    [valid_items, invalid_items] 
end 

它返回一個有效的電子郵件地址作爲密鑰和相關的名稱作爲值的哈希值。任何無效項在invalid_items數組中返回。

電子郵件正則表達式的一個有趣的討論參見http://www.regular-expressions.info/email.html

我做一點寶石在此情況下可能對某人有用https://github.com/victorgrey/email_addresses_parser

0

您可以使用rfc822寶石。它包含尋找符合RFC的電子郵件的正則表達式。您可以使用零件輕鬆擴展它以查找姓氏和名字。