字符串解析 - Python

我正在處理一些我已解決的任務，但我想問一下某個場景。我有一個文本文件，其中包含大量的電子郵件。電子郵件的一些主題行也是用時間和日期編寫的，而其他的只是使用電子郵件地址編寫的。例如字符串解析 - Python

From [email protected] Sat Jan 5 09:14:16 2008 
This is a test email. 
From [email protected] 
random text. 
From [email protected] 
From [email protected] Sat Jan 6 03:14:16 2008 
From [email protected]

等等..... 我抽出與啓動「從」，並有日期和時間，在他們的學科的所有電子郵件地址的任務。在上述情況下，我可以忽略那些不以'From'開始並且不以'2008'結尾的行。我的代碼如下。

fh = open(fname) 
for line in fh: 
    line = line.rstrip() 
    if not line.startswith('From'): continue 
    if not line.endswith('2008'): continue 
    words = line.split() 
    print words[1]

我的問題是，如果電子郵件主題結束於不同的隨機年份，該怎麼辦？在這種情況下，我不能再使用if not line.endswith('2008'): continue。誰能告訴我那是什麼邏輯呢？謝謝

來源

2015-10-17 Uziii

您可以使用正則表達式進行檢查（而不是line：if line.endswith（'2008'）：continue）。

year = re.search(r'\d{4}$', line) 

if year is not None: 
    continue

來源

2015-10-17 12:03:18 blackmamba

對於更復雜的解析，您應該使用python正則表達式包re。這是更強大的（但並不總是清晰..）

專門針對你的問題，你可以使用這樣的事情：

import re 

fh = open(fname) 
for line in fh: 
    result = re.search(r'^From .* \d{4}$', line) 
    if result is not None: 
     words = line.split() 
     print words[1]

^From - 它開始與「發件人」的所有字符串相匹配。 \d{4}$ - 匹配以4個十進制數字結尾的所有字符串。 .* - 匹配之間的任何字符。

來源

2015-10-17 12:31:24

字符串解析 - Python

回答

相關問題