2016-02-12 113 views
2

我有一個從gem5輸出的文本文件(即,我無法控制其格式)。PYTHON:在文本文件中讀取不適用於分隔符

這是因爲這樣:

---------- Begin Simulation Statistics ---------- 
sim_seconds         9.553482      # Number of seconds simulated 
sim_ticks        9553481748000      # Number of ticks simulated 
final_tick        9553481748000      # Number of ticks from beginning of simulation (restored from checkpoints and never reset) 
sim_freq         1000000000000      # Frequency of simulated ticks 
host_inst_rate         911680      # Simulator instruction rate (inst/s) 
host_op_rate         1823361      # Simulator op (including micro ops) rate (op/s) 
host_tick_rate        1669871119      # Simulator tick rate (ticks/s) 
host_mem_usage         662856      # Number of bytes of host memory used 
host_seconds         5721.09      # Real time elapsed on the host 
sim_insts         5215804132      # Number of instructions simulated 
sim_ops         10431608523      # Number of ops (including micro ops) simulated 

使用CSV模塊I具有與空格分隔的行的問題。如果我用空格分隔,所有空格都被讀入,如果我用\ t分隔,它根本不會確認任何內容。

如何輕鬆處理這些空間,因爲我只是想在左列中讀取值以及歸因於它的值。

csv導入仍然適合還是有更強大的東西?

回答

2

csv.reader仍然可以爲相關的使用案例,看看使用skipinitialspace參數的csv.reader

csv.reader(csvfile, delimiter= ' ', skipinitialspace=True)

這將導致文件被空白分隔,但分隔符後面的其他空白將被忽略。

r = csv.reader(csvfile, delimiter= ' ', skipinitialspace=True) 
for row in r: 
    print row 

['sim_seconds', '9.553482', '#', 'Number', 'of', 'seconds', 'simulated'] 
['sim_ticks', '9553481748000', '#', 'Number', 'of', 'ticks', 'simulated'] 
['final_tick', '9553481748000', '#', 'Number', 'of', 'ticks', 'from', 'beginning', 'of', 'simulation', '(restored', 'from', 'checkpoints', 'and', 'never', 'reset)'] 
['sim_freq', '1000000000000', '#', 'Frequency', 'of', 'simulated', 'ticks'] 
['host_inst_rate', '911680', '#', 'Simulator', 'instruction', 'rate', '(inst/s)'] 
['host_op_rate', '1823361', '#', 'Simulator', 'op', '(including', 'micro', 'ops)', 'rate', '(op/s)'] 
['host_tick_rate', '1669871119', '#', 'Simulator', 'tick', 'rate', '(ticks/s)'] 
['host_mem_usage', '662856', '#', 'Number', 'of', 'bytes', 'of', 'host', 'memory', 'used'] 
['host_seconds', '5721.09', '#', 'Real', 'time', 'elapsed', 'on', 'the', 'host'] 
['sim_insts', '5215804132', '#', 'Number', 'of', 'instructions', 'simulated'] 
['sim_ops', '10431608523', '#', 'Number', '...'] ` 

然後,您可以只使用每一行的第一個2個值

3

分割使用re.split

import re 

d = """ ---------- Begin Simulation Statistics ---------- 
sim_seconds         9.553482      # Number of seconds simulated 
sim_ticks        9553481748000      # Number of ticks simulated 
final_tick        9553481748000      # Number of ticks from beginning of simulation (restored from checkpoints and never reset) 
sim_freq         1000000000000      # Frequency of simulated ticks 
host_inst_rate         911680      # Simulator instruction rate (inst/s) 
host_op_rate         1823361      # Simulator op (including micro ops) rate (op/s) 
host_tick_rate        1669871119      # Simulator tick rate (ticks/s) 
host_mem_usage         662856      # Number of bytes of host memory used 
host_seconds         5721.09      # Real time elapsed on the host 
sim_insts         5215804132      # Number of instructions simulated 
sim_ops         10431608523      # Number of ops (including micro ops) simulated""" 

# Skip first line 
for line in d.split("\n")[1:]: 
    # Columns are separated by runs of spaces. Only get three parts. 
    parts = re.split(r'\s+', line, 3) 
    # Only print the first two columns. 
    print(parts[:2]) 

輸出:

['sim_seconds', '9.553482'] 
['sim_ticks', '9553481748000'] 
['final_tick', '9553481748000'] 
['sim_freq', '1000000000000'] 
['host_inst_rate', '911680'] 
['host_op_rate', '1823361'] 
['host_tick_rate', '1669871119'] 
['host_mem_usage', '662856'] 
['host_seconds', '5721.09'] 
['sim_insts', '5215804132'] 
['sim_ops', '10431608523'] 
相關問題