我想解析郵件字符串與Ruby mail gem,我有一個魔鬼的字符編碼的時間。看看下面的電子郵件:字符編碼與Ruby 1.9.3和郵件寶石
MIME-Version: 1.0
Sender: [email protected]
Received: by 10.142.239.17 with HTTP; Thu, 14 Jun 2012 06:00:18 -0700 (PDT)
Date: Thu, 14 Jun 2012 09:00:18 -0400
Delivered-To: [email protected]
X-Google-Sender-Auth: MxfFrMybNjBoBt4O4GwAn9cMsko
Message-ID: <[email protected]om>
Subject: Re: [Lorem Ipsum] Foo updated the forum topic 'Reply by email test'
From: Foo Bar <[email protected]>
To: Foo <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
This email has accents:=A0R=E9sum=E9
>
> --------- Reply Above This Line ------------
>
> Email parsing with accents: R=E9sum=E9
>
> Click here to view this post in your browser
電子郵件正文,當編碼正確,應該是:
This reply has accents: Résumé
>
> --------- Reply Above This Line ------------
>
> Email parsing with accents: Résumé
>
> Click here to view this post in your browser
不過,我在的時候實際上得到了重音符號來通過一個魔鬼。以下是我已經試過:
message = Mail.new(email_string)
body = message.body.decoded
這會激發我開始像這樣的字符串:最後
This reply has accents:\xA0R\xE9sum\xE9\r\n>\r\n> --------- Reply Above This Line ------------
,我試試這個:
body.encoding # => <Encoding:ASCII-8BIT>
body.encode("UTF-8") # => Encoding::UndefinedConversionError: "\xA0" from ASCII-8BIT to UTF-8
有沒有人有任何建議如何處理這個?我很確定它與電子郵件中的「charset = ISO-8859-1」設置有關,但我不確定如何使用該設置,或者如果有方法可以使用郵件gem輕鬆提取。
太棒了。一直在尋找這個。結束這樣做:body = message.text_part.encode('UTF-8',message.text_part.charset,:invalid =>:replace,:undef =>:replace) –
真棒......感謝tun ... – Jyothu
某些部分似乎沒有字符集。我不確定如何處理這些問題。 –