Python：函數跳過打開文件的註釋行，並用當前行傳遞文件對象

我正在嘗試編寫一些Python代碼來編輯某個特定軟件的某些（現有）輸入和輸出文件。我感興趣的所有文件可以以第一個字符爲＃（註釋行數未知）的註釋行開始。Python：函數跳過打開文件的註釋行，並用當前行傳遞文件對象

我總是想跳過這些評論行來讀取/存儲重要的文本。因此，我想創建一個函數，對於以讀取模式打開的文件對象，它將跳過註釋行，以使下一次從文件對象讀取的調用位於第一個非註釋行的文件。目前，我試圖創建一個類，然後使用skip_comments（）方法（參見下面的代碼）：

import os 
class FileOperations: 

    def __init__(self, directory, filename): 
     self.directory = directory 
     self.filename = filename 
     self.filepath = os.path.abspath(os.path.join(directory,filename)) 
     self.fo = open(self.filepath,'r') 

    def skip_comments(self): 
     """ Passes the current position to the location of the first non-comment 
     line of self.fo""" 

     for line in self.fo: 
      if not line.lstrip().startswith('#'): 
       break 
     print line ## Just to check if in correct spot

一個類實例化對象的作品，我可以像讀取普通對象的文件操作（）和seek（）：

In [47]: fh = FileOperations('file_directory','file.txt')` 
In [48]: fh.fo.read(10) 
Out[48]: '#This file'` 
In [49]: fh.fo.seek(0)

但是當我嘗試使用skip_comments（）方法，然後將目標文件中我有問題閱讀：

In [50]: fh.skip_comments() 
20 740 AUX IFACE AUX QFACT AUX CELLGRP 

Out[50]: <open file '... file_dir\file.txt', mode 'r' at 0x0000000008797D20> 
In [51]: fh.fo.read(10) 
--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-51-20f04ae797fe> in <module>() 
----> 1 fh.fo.read(10) 

ValueError: Mixing iteration and read methods would lose data

有人可以幫我解決這個bug或建議的更好的方法這樣做？謝謝！

來源

2014-10-09 dhltp

[This question]（http://stackoverflow.com/questions/4762262/is-it-safe-to-mix-readline-and-line-iterators-in-python-file-processing）解釋了原因錯誤。基本上，因爲'next（f）'（在使用迭代時調用）在內部使用預讀緩衝區來提高性能，所以您不能在f中將'f.read（）'與'for line'混合使用，但是這與使用'read'或'readline'不兼容，因爲他們不知道預讀緩衝區。 – dano 2014-10-09 18:38:37

你想要做的是把skip_lines()函數變成一個生成器。下面的生成器會根據您傳遞給它的文件名生成非註釋行。

所以：

def skip_comments(filename): 
    with open(filename, 'rb') as f: 
     for line in f: 
      if not line.strip().startswith('#'): 
       yield line 

#then, to use the generator you've just created: 
for line in skip_comments(filename): 
    #do stuff with line 

#if you want all the lines at the same time... 
lines = list(skip_comments(filename)) 
#lines is now a list of all non-comment lines in the file

編輯：更快（更密集的）版本將skip_comments = lambda filename: (line for line in open(filename, 'rb') if not line.startswith('#'))。這使用了一個更快的發生器表達式（在我的機器上節省了大約三分之一的時間）。

來源

2014-10-09 17:57:58

爲什麼不'如果不是......：屈服......'並放棄其他？ – 2014-10-09 17:59:22

@AaronHall：有道理。我正在考慮使用'break'這個問題更明顯。 – 2014-10-09 18:00:27

@ChinmayKanchi：我去了解發電機，結果更加困惑。你能否擴展你的範例，如何真正做我想做的事。換句話說：給定您的生成器skip_comments，如何應用它，然後在給定文件名中的註釋之後對文本執行一些其他操作？ – dhltp 2014-10-16 23:44:51

Python：函數跳過打開文件的註釋行，並用當前行傳遞文件對象

回答

相關問題