嗨我已經通過各種帖子在這裏,但沒有人回答我的問題,我有兩個問題, 1. 我寫了一個腳本來獲取電子郵件使用poplib,everythig工作正常,直到當我試圖解析電子郵件的正文時,它將擺脫標籤加上其中的數據,現在我放棄並在這裏尋求幫助,希望你們能指導我朝着正確的方向發展我做錯了什麼,或者我應該怎麼做才能使它工作。Python:msg.get_payload()丟棄所需的數據,解決方案想要
這裏的解析器腳本的源
import sys
import socket
import poplib
import email
import csv
import re
try:
host = "mail.someserver.com"
mail = poplib.POP3(host)
print mail.getwelcome()
print mail.user("[email protected]")
print mail.pass_("qaiaJWkvZT")
print mail.stat()
print mail.list()
print ""
emailWriter = csv.writer(open('emailMessages.csv', 'wb'), delimiter=',', quotechar='\'', quoting=csv.QUOTE_MINIMAL)
emailWriter.writerow(['Messages'])
if mail.stat()[1] > 0:
print "You have new mail."
else:
print "No new mail."
print ""
numMessages = len(mail.list()[1])
for i in range(numMessages):
for j in mail.retr(i+1)[1]:
#print j
msg = email.message_from_string(j) # new statement
print msg.get_payload(decode=True)
#emailWriter.writerow([msg.get_payload(decode=True)]) # new statement
mail.quit()
input("Press any key to continue.")
except socket.error as e:
print "Something went wrong! :(\nREASON:\n{0}:{1}".format(e.errno, e.strerror)
raise
except:
print "Something went wrong!", sys.exc_info()[0]
raise
這裏是上面的腳本生成
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.or
g/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<style type="text/css">
BODY {
}
TD {
}
TH {
}
H1 {
}
TABLE,IMG,A {
}
</style>
</head>
<body>
<p><strong>PO Number:</strong> 35164</p>
<p><strong>Ship To:</strong><br />
Joe Pasloski<br />
16 Redwood Drive<br />Yorkton, SK S3N2X7<br />
204-473-2218</p>
<table cellspacing="0" cellpadding="5" border="1" width="710" align="left">
<tr>
</tr>
<tr>
</tr>
</table>
</body>
</html>
但是如果我改劇本直接打印在循環中j變量中的輸出,它給了我這個
Return-Path: <[email protected]>
Delivered-To: [email protected]
Received: (qmail 7636 invoked by uid 48); 14 Jul 2012 23:29:11 -0000
Date: 14 Jul 2012 23:29:11 -0000
Message-ID: <[email protected]>
To: [email protected]
Subject: Drop Ship Order - Joe Pasloski
From: Someserver.com <[email protected]>
X-Mailer: PHP/5.2.17
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="2631183869_50020"
Reply-to: SomeServer.com <[email protected]>
X-Antivirus: avast! (VPS 120714-2, 07/15/2012), Inbound message
X-Antivirus-Status: Clean
--2631183869_50020
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
--2631183869_50020
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.or
g/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<style type="text/css">
BODY {
MARGIN-TOP: 10px;
MARGIN-BOTTOM: 10px;
MARGIN-LEFT: 10px;
MARGIN-RIGHT: 10px;
FONT-SIZE: 12px;
FONT-FAMILY: arial,helvetica,sans-serif
PADDING: 0px;
}
TD {
FONT-SIZE: 12px;
FONT-FAMILY: arial,helvetica,sans-serif
COLOR: #000000;
}
TH {
FONT-SIZE: 13px;
FONT-FAMILY: arial,helvetica,sans-serif
}
H1 {
FONT-SIZE: 20px
}
TABLE,IMG,A {
BORDER: 0px;
}
</style>
</head>
<body>
<p><strong>PO Number:</strong> 35164</p>
<p><strong>Ship To:</strong><br />
Joe Pasloski<br />
16 Redwood Drive<br />Yorkton, SK S3N2X7<br />
204-473-2218</p>
<p><strong>Items:</strong>
<table cellspacing="0" cellpadding="5" border="1" width="710" align="left">
<tr>
<th align="left" width="100">SKU</th>
<th align="left" width="550">Product</th>
<th align="left" width="60">Qty</th>
</tr>
<tr>
<td>JJ-Hamper-Firetruck</td>
<td>Frankie's Fire Truck Laundry Hamper</td>
<td>1</td>
</tr>
</table>
</body>
</html>
,如果我需要處理原始消息,我怎麼能效爲了消除不必要的html標籤而不丟失任何數據,消息的主體部分會自動消除?或者,如果可以通過使用get_payload()方法,我可以做些什麼來使其工作。
請幫忙!
2. 還有一種方法可以使用正則表達式獲取表中包含的所有SKU信息嗎?如果你能爲我提供這樣的服務,那將是一個好的選擇。謝謝噸