2015-08-24 29 views
2

文件更好的辦法我有以下類型的文件:來劃分它有單獨的位置的標記在Python

--- part0 --- 
some 
strings 
--- part1 --- 
some other 
strings 
--- part2 --- 
... 

我想獲得的文件以Python列表的任何部分:

x = get_part_of_file(part=0) 
print x # => should print ['some', 'strings'] 
x = get_part_of_file(part=1) 
print x # => should print ['some other', 'strings'] 

所以,我的問題是什麼,是落實上述使用get_part_of_file方法最簡單的方法。

我(醜)解決方案是象下面這樣:

def get_part_of_file(part, separate_str="part"): 
    def does_match_to_separate(line): 
     return re.compile("{}.*{}".format(separate_str, part)).match(line) 
    def get_first_line_num_appearing_separate_str(lines): 
     return len(list(end_of_loop() if does_match_to_separate(line, part) else line for line in lines)) 

    with open("my_file.txt") as f: 
     lines = f.readlines() 

    # get first line number of the required part 
    first_line_num = get_first_line_num_appearing_separate_str(part) 
    # get last line number of the required part 
    last_line_num = get_first_line_num_appearing_separate_str(part + 1) - 1 
    return lines[first_line_num:last_line_num] 

回答

2

你可以使用正則表達式來解析字符串。看下面這個例子在這裏和嘗試上regex101

--- part(?P<part_number>\d+) ---\s(?P<part_value>[\w\s]*) 

這分析給定的字符串轉換成以下幾組:

  • MATCH 1 PART_NUMBER [8-9] 0 part_value [14-27 ]
  • MATCH 2 PART_NUMBER [35-36] 1 part_value [41-60] some other strings

現在,在Python中,你不能讓所有的組與

import re 
parts = re.finditer(your_regex_pattern, text) 

for p in parts: 
    print("Part %s: %s" % (p.group('part_number'), p.group('part_value')) 
    # or return the element with the part-number you want. 

你可以運行到是唯一的問題,此刻正則表達式模式並不只包括個字符,空格和換行\w\s。如果零件的值中還有其他字符,則必須擴展該模式以匹配更多字符。

1

使用re.split你可以寫類似

>>> input_file = open('input', 'r') 
>>> content = input_file.read() 
>>> content_parts = re.split('.+?part\d+.+?\n', content) 

>>> content_parts 
['', 'some\nstrings\n', 'some other\nstrings\n', ''] 

>>> [ part.split('\n') for part in content_parts if part ] 
[['some', 'strings', ''], ['some other', 'strings', '']] 
相關問題