2017-06-23 38 views
0

我正在使用Python處理與電子郵件相關的PoC。系統收到包含郵件蹤跡的電子郵件。我想從郵件蹤跡中分離出個別電子郵件並處理它們。問題是我沒有得到正確的代碼或庫來做到這一點。任何人都可以請幫忙。使用Python將個別郵件與郵件路徑分開

例如, 系統收到電子郵件像下面

SUBJECT: RE: CALL ID # 98670786 CALL ID # 98983051 DATE SENT: 23-JANUARY-2017 TIME SENT: 17:56:09 PM SENDER ID: [email protected] MESSAGE TEXT: DEAR SIR, 

Please check and let me know 
REGARDS 

XXXXXXX 
00000015 

FROM: company;[email protected] 
SENT:MON, 23 JAN 2017 16:04:26 +0530 
TO: [email protected] 
SUBJECT: RE: RE: CALL ID # 98670786 CALL ID # 98983051 
DEAR MR. XXXXX, 
> 
>WE REFER TO YOUR EMAIL DATED 20/01/2017 FOR THE company. 
> 
>WE HEREBY INFORM YOU THAT WE HAVE CHECKED WITH OUR TOUCH POINT AND THEY HAVE CONFIRMED THAT THE Things HAS BEEN delivered TO YOU AND WE WOULD KINDLY REQUEST YOU TO CHECK YOUR at your end FOR BETTER ASSISTANCE. 
> 
> 
> 
>YOURS SINCERELY, 
> 
>sender, 
>Company 
>---------------------------------------------------------- 
>Disclaimers: 
>adsadsadsadadasdada 
>daadsadadsadsadsa. 
>REGISTERED ADDRESS:-sadsadsadsadsadsadsadasdsadsadsadsa 
>---------------ORIGINAL MESSAGE------------------ 
>SUBJECT: CALL ID # 98418758 CALL ID # 98510240 CALL ID # 98670786 DATE SENT: 20-JANUARY-2017 TIME SENT: 11:06:38 AM SENDER ID: [email protected] MESSAGE TEXT: DEAR SIR, 
> 
>BY WHEN WILL THIS things WILL BE delivered TO Me. 
> 
>REGARDS 
> 
>XXXXXXX 
> 
>00000015 
> 
>FROM: "company"[email protected] 
>SENT:FRI, 20 JAN 2017 10:44:16 +0530 
>TO: [email protected] 
>SUBJECT: RE: RE: CALL ID # 98510240 CALL ID # 98670786 
>DEAR MR. XXXXX, WE APPRECIATE YOUR TIME AND PATIENCE AND APOLOGIZE FOR THE LATE RESPONSE. 
>> 
>>WE REFER TO YOUR EMAIL DATED 11/01/2017N FOR company NUMBER 00000015. WITH REGARDS TO YOUR CONCERN WE HEREBY INFORM YOU THAT TILL DATE YOUR things is pending with us. 
>>TRUST THIS CLARIFIES YOUR CONCERN. YOURS SINCERELY, 
>> 
>>Sender. 
>>company 
>>---------------------------------------------------------- 
>>CALL CENTER TIMINGS: 10.00 A.M. TO 7.00 P.M MONDAY TO SATURDAY (EXCEPT NATIONAL HOLIDAYS) 

上述郵件應該在四個部分被分裂像下面

1)

SUBJECT: RE: CALL ID # 98670786 CALL ID # 98983051 DATE SENT: 23-JANUARY-2017 TIME SENT: 17:56:09 PM SENDER ID: [email protected] MESSAGE TEXT: DEAR SIR, 

Please check and let me know 

REGARDS 

XXXXXXX 
00000015 

2)

FROM: company;[email protected] 
SENT:MON, 23 JAN 2017 16:04:26 +0530 
TO: [email protected] 
SUBJECT: RE: RE: CALL ID # 98670786 CALL ID # 98983051 
DEAR MR. XXXXX, 
> 
>WE REFER TO YOUR EMAIL DATED 20/01/2017 FOR THE company. 
> 
>WE HEREBY INFORM YOU THAT WE HAVE CHECKED WITH OUR TOUCH POINT AND THEY HAVE CONFIRMED THAT THE Things HAS BEEN delivered TO YOU AND WE WOULD KINDLY REQUEST YOU TO CHECK YOUR at your end FOR BETTER ASSISTANCE. 
> 
> 
> 
>YOURS SINCERELY, 
> 
>sender, 
>Company 
>---------------------------------------------------------- 
>Disclaimers 
>adsadsadsadadasdada 
>daadsadadsadsadsa. 
>REGISTERED ADDRESS:-sadsadsadsadsadsadsadasdsadsadsadsa 

3)

>---------------ORIGINAL MESSAGE------------------ 
>SUBJECT: CALL ID # 98418758 CALL ID # 98510240 CALL ID # 98670786 DATE SENT: 20-JANUARY-2017 TIME SENT: 11:06:38 AM SENDER ID: [email protected] MESSAGE TEXT: DEAR SIR, 
> 
>BY WHEN WILL THIS things WILL BE delivered TO Me. 
> 
>REGARDS 
> 
>XXXXXXX 
> 
>00000015 
> 

4)

>FROM: "company"[email protected] 
>SENT:FRI, 20 JAN 2017 10:44:16 +0530 
>TO: [email protected] 
>SUBJECT: RE: RE: CALL ID # 98510240 CALL ID # 98670786 
>DEAR MR. XXXXX, WE APPRECIATE YOUR TIME AND PATIENCE AND APOLOGIZE FOR THE LATE RESPONSE. 
>> 
>>WE REFER TO YOUR EMAIL DATED 11/01/2017N FOR company NUMBER 00000015. WITH REGARDS TO YOUR CONCERN WE HEREBY INFORM YOU THAT TILL DATE YOUR things is pending with us. 
>>TRUST THIS CLARIFIES YOUR CONCERN. YOURS SINCERELY, 
>> 
>>Sender. 
>>company 
>>---------------------------------------------------------- 
>>CALL CENTER TIMINGS: 10.00 A.M. TO 7.00 P.M MONDAY TO SATURDAY (EXCEPT NATIONAL HOLIDAYS) 

------ ----- EDITED 經過大量置換,我來用下面的代碼。

startMsgPatter= 
re.compile((\W*ORIGINAL\s*MESSAGE|\W*FROM\s*:|\W*ON.*WROTE\s*:)") 
def sperateEmails(callDesc): 
    itr = startMsgPatter.finditer(callDesc) 
    blockStart = 0 
    emails = [] 

    while True: 
     m = next(itr,None) 
     if not m: 
      break 
     blockEnd = m.start() 
     if blockStart >= blockEnd: 
      continue 
     emailPart = callDesc[blockStart:blockEnd] 
     emails.append(emailPart) 
     blockStart = blockEnd 
     emails.append(callDesc[blockStart:len(callDesc)]) 
    return emails 

它正在工作,但我必須繼續查找指示郵件開始和結束的模式並更新它。按照我的說法,這封郵件應該遵循特定的模式。考慮到大多數此類模式,是否有人編寫過代碼,請分享。

+0

你能舉個例子嗎? – BoilingFire

回答

1

可以使用函數分裂()

實施例:

"first mail separator second mail".split(" separator ") 

可否outpout:

[ 「第一郵件」, 「第二郵件」]

你只需要知道哪個分離rator使用。請注意,分隔符將從結果中刪除,但如果需要它,則可以在之後重新使用它。

在你的情況,似乎所有的消息都是由字符串分隔

"---------------ORIGINAL MESSAGE------------------" 

"FROM" 

我建議你在第一個,然後在第二個這樣的第一次分裂:

all = [] # Splitted messages will be stored here 
# mail_trail is the content of your mail trail 
sep = mail_trail.split("---------------ORIGINAL MESSAGE------------------") 
for msg in sep: 
    sep2 = msg.split("FROM") 
    if len(sep2) == 2: # has splitted 
     sep2[1] = "FROM" + sep2[1] # reappend the FROM since you need it 
    all.extend(sep2) # Add the messages in the array 

這應該把你放在正確的軌道上。

+0

你的代碼非常有用。但我能以其他方式做到這一點。我已經在問題中發佈了我的代碼。那麼extend()函數在代碼中無法正常工作。它將字符串參數中的單個字符附加到列表中。append()對我很好。 – skvp