2010-10-24 23 views
1

我有顯示課程編號,名稱,成績和學生所選課程的其他信息的文本。具體來說,線條看起來像這些:python re模塊 - 使用什麼正則表達式來提取文本片段

0301 453 20071 LINEAR SYSTEMS I     A 4 4 16.0 

0301 481 20071 ELECTRONICS I WITH LAB    A 4 4 16.0 

0301 481 20084 ELECTRONICS II WITH LAB  RE  B 4 4 12.0 

0301 713 20091 SOLID STATE PHYSICS   NG   0 0  0.0 

0511 454 20074 INT'L TRADE & FINANCE    B 4 4 12.0 

我想寫一個正則表達式提取:

LINEAR SYSTEMS I 
ELECTRONICS I WITH LAB 
ELECTRONICS II WITH LAB 
SOLID STATE PHYSICS 
INT'L TRADE & FINANCE 

我寫了下面

pattCourseName = re.compile(r'([-/&A-Z\':\s]{2,})(\s+[A-Z])')

然而,這給了我

LINEAR SYSTEMS I 
ELECTRONICS I WITH LAB 
ELECTRONICS II WITH LAB  RE 
SOLID STATE PHYSICS 
INT'L TRADE & FINANCE 

也就是說,我無法擺脫RE部分。

有人可以幫忙嗎?謝謝!

回答

5

如果你表現出的佈局是固定的,然後忘了正則表達式,只要抓住你想要的列:

course_name = line[16:45].strip() 
+0

美麗的解決方案!謝謝! – Curious2learn 2010-10-24 13:25:58

2
for line in open("file"): 
    s=filter(None,line.split(" ",4)) 
    print s[3].replace(" ","|").split("|",1)[0] 

輸出

$ python myscript.py 
LINEAR SYSTEMS I 
ELECTRONICS I WITH LAB 
ELECTRONICS II WITH LAB 
SOLID STATE PHYSICS 
INT'L TRADE & FINANCE 
+0

美麗!當列不對齊時,這會很棒,我從你的解決方案中學到了新的命令。謝謝! – Curious2learn 2010-10-24 13:27:46

相關問題