用Python處理文本文件時的預期字符串或緩衝區

我正在處理來自我的thunderbird imap目錄的大型（120mb）文本文件，並嘗試使用mbox和正則表達式從頭信息中提取信息。該進程運行一段時間，直到我最終得到一個異常：「TypeError：預期的字符串或緩衝區」。用Python處理文本文件時的預期字符串或緩衝區

異常引用該代碼的第五行：

PAT_EMAIL = re.compile(r"[0-9A-Za-z._-]+\@[0-9A-Za-z._-]+") 
temp_list = [] 
mymbox = mbox("data.txt") 
for email in mymbox.values(): 
    from_address = PAT_EMAIL.findall(email["from"]) 
    to_address = PAT_EMAIL.findall(email["to"]) 
    for item in from_address: 
     temp_list.append(item) #items are added to a temporary list where they are sorted then written to file

我已經在其他（較小）文件運行的代碼，所以我猜這個問題是我的文件。該文件似乎只是一堆文本。有人能指出我的寫作方向來調試嗎？

來源

2013-02-26 spatialaustin

在失敗的迭代中檢查'type（email [「from」]）'''。 – 2013-02-26 03:51:08

你可以發佈你的'findall'方法的代碼嗎？ – eazar001 2013-02-26 04:12:46

增加了findall方法來發布。 – spatialaustin 2013-02-26 05:46:03

嗯，我並沒有解決這個問題，但都圍繞它的工作對我自己的目的。我插入了一個try語句，以便迭代繼續經過任何TypeError。對於每遇到8個失敗的電子郵件地址，這就足夠了。感謝您的輸入！

PAT_EMAIL = re.compile(r"[0-9A-Za-z._-]+\@[0-9A-Za-z._-]+") 
temp_list = [] 
mymbox = mbox("data.txt") 
for email in mymbox.values(): 
    try: 
     from_address = PAT_EMAIL.findall(email["from"]) 
    except(TypeError): 
     print "TypeError!" 
    try: 
     to_address = PAT_EMAIL.findall(email["to"]) 
    except(TypeError): 
     print "TypeError!" 
    for item in from_address: 
     temp_list.append(item) #items are added to a temporary list where they are sorted then written to file

來源

2013-02-27 00:00:21 spatialaustin

能夠以這種方式跳過故障是可以的，如果這符合您的需求。你當然可以（雖然我冒着說明這裏非常明顯的風險）使用除了部分打印出'電子郵件'，並找出是什麼導致失敗，以便你可以完全避免這些錯誤。 – 2013-03-08 11:00:04

只能有一個from地址（我想！）：

在下面：

from_address = PAT_EMAIL.findall(email["from"])

我有一種感覺，你想複製的email.message_from_file和工作email.utils.parseaddr

from email.utils import parseaddr 

>>> s = "Jon Clements <[email protected]>" 
>>> from email.utils import parseaddr 
>>> parseaddr(s) 
('Jon Clements', '[email protected]')

因此，您可以使用parseaddr(email['from'])[1]來獲取電子郵件地址並使用它。

同樣，你不妨看看email.utils.getaddresses處理to和cc地址...

來源

2013-02-26 04:18:19

用Python處理文本文件時的預期字符串或緩衝區

回答

相關問題