2017-07-05 30 views
0

我想修改一個〜43k行的txt文件。在文件中給出命令* Nset後,我需要提取並保存該命令後面的所有行,並在文件中的下一個*命令時停止。在每個命令之後有不同數量的行和字符。舉例來說,這裏的文件的樣本部分:通過文件重複標題之間的提取行

*Nset 

1, 2, 3, 4, 5, 6, 7, 

12, 13, 14, 15, 16, 

17, 52, 75, 86, 92, 

90, 91, 92 93, 94, 95.... 

*NEXT COMMAND 

blah blah blah 

*Nset 

numbers 

*Nset 

numbers 

*Command 

irrelevant text 

我現在有工作的代碼時,我需要的數字是不是之間有兩個* N set個的。當一個* Nset跟隨另一個的數字時,它會跳過該命令和程序線,我不知道爲什麼。當下一個命令不是* Nset時,它會找到下一個命令並將數據完美地提取出來。

import re 

# read in the input deck 
deck_name = 'master.txt' 
deck = open(deck_name,'r') 

#initialize variables 
nset_data = [] 
matched_nset_lines = [] 
nset_count = 0 

for line in deck: 
    # loop to extract all nset names and node numbers 
    important_line = re.search(r'\*Nset,.*',line) 
    if important_line : 
     line_value = important_line.group() #name for nset 
     matched_nset_lines.insert(nset_count,line_value) #name for nset 
     temp = [] 

     # read lines from the found match up until the next *command 
     for line_x in deck : 
      if not re.match(r'\*',line_x): 
       temp.append(line_x) 
      else : 
       break 

     nset_data.append(temp) 

    nset_count = nset_count + 1 

我正在使用Python 3.5。謝謝你的幫助。

+0

是否有一個命令* always *在一行的開頭,以'「*」'開始? –

+0

@ juanpa.arrivillaga,是的。有各種各樣的命令,但是在每個命令之前是「*」。然後下一行是數字。 –

+0

這可能與所有相關嗎? https://stackoverflow.com/questions/25943000/finding-a-word-between-two-words-that-will-not-match-if-the-closing-word-occurs –

回答

0

如果你只是想提取*Nsets之間的界限以下辦法應該工作:

In [5]: with open("master.txt") as f: 
    ...:  data = [] 
    ...:  gather = False 
    ...:  for line in f: 
    ...:   line = line.strip() 
    ...:   if line.startswith("*Nset"): 
    ...:    gather = True 
    ...:   elif line.startswith("*"): 
    ...:    gather = False 
    ...:   elif line and gather: 
    ...:    data.append(line) 
    ...: 

In [6]: data 
Out[6]: 
['1, 2, 3, 4, 5, 6, 7,', 
'12, 13, 14, 15, 16,', 
'17, 52, 75, 86, 92,', 
'90, 91, 92 93, 94, 95....', 
'numbers', 
'numbers'] 

而且,如果你想要更多的信息,這是很簡單的延長上面:

In [7]: with open("master.txt") as f: 
    ...:  nset_lines = [] 
    ...:  nset_count = 0 
    ...:  data = [] 
    ...:  gather = False 
    ...:  for i, line in enumerate(f): 
    ...:   line = line.strip() 
    ...:   if line.startswith("*Nset"): 
    ...:    gather = True 
    ...:    nset_lines.append(i) 
    ...:    nset_count += 1 
    ...:   elif line.startswith("*"): 
    ...:    gather = False 
    ...:   elif line and gather: 
    ...:    data.append(line) 
    ...: 

In [8]: nset_lines 
Out[8]: [0, 14, 18] 

In [9]: nset_count 
Out[9]: 3 

In [10]: data 
Out[10]: 
['1, 2, 3, 4, 5, 6, 7,', 
'12, 13, 14, 15, 16,', 
'17, 52, 75, 86, 92,', 
'90, 91, 92 93, 94, 95....', 
'numbers', 
'numbers'] 
0

這是做你想做的。

command = [] 
commandLines = [] 

with open('test.txt') as file: 
    for line in file: 
     if line.startswith('*'): 
      command.append(line.rstrip()) 
      commandLines.append([]) 
     else: 
      commandLines[-1].append(line.rstrip()) 

import pprint 

pprint.pprint(command) 
pprint.pprint(commandLines) 

commandLines[i]是含有對應於command[i]線的列表。

打印出來的命令:

['*Nset', '*NEXT COMMAND', '*Nset', '*Nset', '*Command'] 

而且線(嵌套表):

[['1, 2, 3, 4, 5, 6, 7,', 
    '12, 13, 14, 15, 16,', 
    '17, 52, 75, 86, 92,', 
    '90, 91, 92 93, 94, 95....'], 
['blah blah blah'], 
['numbers'], 
['numbers'], 
['irrelevant text']] 

假設:只有命令行以 '*'。