2014-10-19 92 views
0

我試圖解析它具有以下格式的文本文件輸出文件作爲CSV:解析和在Python

+++++ 
line1 
line2 
<<<<< 
+++++ 
rline1 
rline2 
<<<<< 

其中,+++++指記錄的開始和<<<<<指記錄的末尾。

現在我要輸出的整個文本爲CSV的格式如下:

line1, line2 
rline1, rline2 

我想某事像這樣:

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<'] 
output_lines =[] 

for line in lines: 
    if (line == "+++++") or not(line == "<<<<<") : 
     if (line == "<<<<<"): 
      output_lines.append(line) 
      output_lines.append(",") 

print (output_lines) 

我不知道如何從這裏向前邁進。

回答

0

收集線在嵌套循環,直到最終記錄的最標記,寫出來的結果列表到CSV文件:

import csv 

with open(inputfilename) as infh, open(outputfilename, 'w', newline='') as outfh: 
    writer = csv.writer(outfh) 
    for line in infh: 
     if not line.startswith('+++++'): 
      continue 

     # found start, collect lines until end-of-record 
     row = [] 
     for line in infh: 
      if line.startswith('<<<<<'): 
       # found end, end this inner loop 
       break 
      row.append(line.rstrip('\n')) 

     if row: 
      # lines for this record are added to the CSV file as a single row 
      writer.writerow(row) 

外環需要從輸入文件中的行,但跳過任何看起來不像記錄的開始。一旦找到開始,第二個內部循環從文件對象中抽取更多行,並且只要它們不是而不是看起來像記錄的結尾,將它們添加到列表對象(無行分隔符) 。

找到記錄的結尾時,結束內循環,並且如果在row列表中收集了任何行,則會將其寫入CSV文件。

演示:

>>> import csv 
>>> from io import StringIO 
>>> import sys 
>>> demo = StringIO('''\ 
... +++++ 
... line1 
... line2 
... <<<<< 
... +++++ 
... rline1 
... rline2 
... <<<<< 
... ''') 
>>> writer = csv.writer(sys.stdout) 
>>> for line in demo: 
...  if not line.startswith('+++++'): 
...   continue 
...  row = [] 
...  for line in demo: 
...   if line.startswith('<<<<<'): 
...    break 
...   row.append(line.rstrip('\n')) 
...  if row: 
...   writer.writerow(row) 
... 
line1,line2 
13 
rline1,rline2 
15 

書寫線後的數字是寫入的字節的數量,如通過writer.writerow()報道。

1

也許是這樣的?

from itertools import groupby 
import csv 

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<'] 

# remove the +++++s, so that only the <<<<<s indicate line breaks 
cleaned_list = [ x for x in lines if x is not "+++++" ] 

# separate at <<<<<s 
rows = [list(group) for k, group in groupby(cleaned_list, lambda x: x == "<<<<<") if not k] 

f = open('result.csv', 'wt') 
try: 
    writer = csv.writer(f) 
    for row in rows: 
     writer.writerow(row) 
finally: 
    f.close() 

print open('result.csv', 'rt').read() 
+0

好用的groupby,但你可能想添加一些關於這裏發生了什麼的描述。 – PaulMcG 2014-10-19 13:51:46