2016-01-24 58 views
0

不斷學習python,對於REGEX很新。我想從一個文本文件中獲取信息,並把它變成一個列表供以後處理:Python REGEX和文件I/O

下面是一個示例Python文件:

import re 

text = '''name = file details 
version = v1.2 
;---------------- 
; Notes on line one 
; Notes on line two 
; 
; Notes on line four, skipping line 3 
;-------------- 
configuring this device 
configuring that device 
; I don't want this note''' 



def notes(path): 
    file = re.split('\n+', path) 
    outputName = outputVer = outputNote = '' 
    notes = [] 
    outputNotes = [] 
    for line in file: 
     name = re.search('^name = (.*)$', line) 
     ver = re.search('^version = (.*)$', line) 
     note = re.search('; (.*)', line) 
     if name: 
      outputName = name.group(1) 
     if ver: 
      outputVer = ver.group(1) 
     notes.append(note) 
    for note in notes: 
     print(note) 



    info = (outputName, outputVer, outputNotes) 
    print(info[2]) 

    for notes in info[2]: 
     if notes: 
      print(notes) 

    print(info) 


notes(text) 

我想是抓住了「名」,「版本「和」筆記「

我可以得到沒有問題的名稱和版本,筆記是我遇到的問題。對於筆記,我希望在---------標記之間的所有內容。我不想要稍後在文件中的筆記。

從本質上講,我希望輸出的樣子:

('file details', 'v1.2', ['Notes on line one', 'Notes on line two', '','Notes on line four, skipping line 3']) 

而且,我敢肯定有很多方法可以優化此,我很想聽聽建議。

+0

請郵寄文件的內容,併發布你想明確地定義和明確提取什麼。 – SIslam

+0

我在代碼中包含的「文本」變量中包含來自文件的示例內容。 –

回答

0

如果我理解了您的問題陳述,那麼您只是在文件頂部讀取不同數量的行。根本沒有理由爲此使用正則表達式 - 只需讀取2行的名稱和版本,然後讀取頭部起始行('; ---'),然後循環,將行讀入數組中,直到看到頭部最後一行('; ---')。

0

隨着MULTILINEDOTALL模式:

(?:^;-+$)(.*?)(?:^;-+$) 

看到一個demo on regex101.com
或者在這裏作爲一個完整的演練:

import re 

text = _your_string_ 

def notes(): 
    lines = re.split('\n', text) 
    for line in lines: 
     if line.startswith('name'): 
      name = re.search(r"^name = (.*)", line) 
      if (name): 
       outputName = name.group(1) 
     elif line.startswith('version'): 
      version = re.search(r"^version = (.*)", line) 
      if (version): 
       outputVer = version.group(1) 

    # now the notes part 
    notes = re.search(r"(?:^;-+$)(.*?)(?:^;-+$)", text, re.MULTILINE|re.DOTALL) 
    outputNotes = [x.strip() for x in re.split(r'\n;?', notes.group(1)) if x] 
    info = [outputName, outputVer, outputNotes] 
    return info 

info = notes() 
print info 
# ['file details', 'v1.2', ['Notes on line one', 'Notes on line two', 'Notes on line four, skipping line 3']] 
0

這需要多重組合方式如下─因爲我以前named-capture-group,以提取notes我申請了正則表達式兩次,以選擇;-----內的文本和行內有文本不僅;

import re 

txt = '''name = file details 
version = v1.2 
;---------------- 
; Notes on line one 
; Notes on line two 
; 
; Notes on line four, skipping line 3 
;-------------- 
configuring this device 
configuring that device 
; I don't want this note''' 
data = re.search(r'name\s*=\s*(?P<name>.*)\W*version\s*=\s*(?P<version>.*)\W*(?:;-+\W)(?P<notes>[\w\W]*)(?:;-+\W)',txt) 
print data.group('name')#prints name 
print data.group('version')#prints version 
#print data.group('notes') 
print [i.strip(';') for i in re.findall(r';\s*[^;]{2,}',data.group('notes'))]#prints notes 

輸出 -

file details 
v1.2 
[' Notes on line one\n', ' Notes on line two\n', ' Notes on line four, skipping line 3\n'] 

看到的第一個正則表達式的詳細信息在HERE