Python如何從文件中跳過註釋行ArgumentParser.convert_arg_line_to_args

我是Python的新手，我試圖從URL（每行一個URL）中刪除註釋和註釋行。我爲了使用自定義ArgumentParser（argparse）和重寫convert_arg_line_to_args： -Python如何從文件中跳過註釋行ArgumentParser.convert_arg_line_to_args

地帶在線的端尾隨註釋例如'http://example.com＃評論'
帶狀線是空的或整條線例如「＃此文件包含的URL，每行一個」

我能夠成功地刪除尾隨註釋（1），但似乎無法刪除空行或註釋行（2）。整行註釋和空行保留在我的文件列表中。

class CustomArgumentParser(argparse.ArgumentParser): 
    def __init__(self, *args, **kwargs): 
     super(CustomArgumentParser, self).__init__(*args, **kwargs) 

    def convert_arg_line_to_args(self, line): 
     '''Strip out comments from start points file''' 
     if re.match('^#.*', line, 0) or re.match('^\s+$', line, 0): 
      yield 
     arg = re.sub('\s+#.*$', '', line) 
     yield arg

有沒有辦法刪除空行和註釋行？

實施例的輸入文件是：

# Start points for the spider 
# 
http://www.website1.com/News.html?typeid=8          # All news 
http://www.website1.com/News.html?typeid=5          # Business 

http://www.website2.com/News.html?category=All%20Category%20News 
http://www.website2.com/News.html?category=Category2

原始代碼給出ARGS返回從parse_args()爲：

DEBUG:root:Args are: Namespace(URLs=['', '# Start points for the spider ', '', '#', 'http://www.website1.com/News.html?typeid=8', 'http://www.website1.com/News.html?typeid=5', 'http://www.website1.com/News.html?typeid=9', 'http://www.website1.com/News.html?typeid=10', 'http://www.website1.com/KeyInterviews.html', '', '', 'http://www.website2.com/News.html?category=All%20Category%20News', 'http://www.website2.com/News.html?category=Category2'], cacheDir='/tmp', debug_level=' 1', firstNPages=None, outputDir=None, storyType='news')

更改以產生空列表給出：

DEBUG:root:Args are: Namespace(URLs=[[], '# Start points for the spider ', [], '#', 'http://www.website1.com/News.html?typeid=8', 'http://www.website1.com/News.html?typeid=5', [], '', 'http://www.website2.com/News.html?category=All%20Category%20News', 'http://www.website2.com/News.html?category=Category2'], cacheDir='/tmp', debug_level=' 1', firstNPages=None, outputDir=None, storyType='news')

我想參數看起來像：

DEBUG:root:Args are: Namespace(URLs=['http://www.website1.com/News.html?typeid=8', 'http://www.website1.com/News.html?typeid=5', 'http://www.website2.com/News.html?category=All%20Category%20News', 'http://www.website2.com/News.html?category=Category2'], cacheDir='/tmp', debug_level=' 1', firstNPages=None, outputDir=None, storyType='news')

也許不可能以這種方式從輸入文件中刪除行。

來源

2015-11-10 Tim James

你爲什麼使用'argparse'解析一個*文件？*這是命令行參數！你會如何使用它？你能否給一個[mcve]更清楚地解釋這個問題？ – jonrsharpe

以「@」開頭的參數被解釋爲包含更多參數的文件的名稱，默認情況下每行一個。 'convert_arg_line_to_args'可以讓你使用更復雜的文件格式。 – chepner

不要**產生一個空列表，但**返回**一個。 – memoselyk

請注意，語句yield將生成None值，而不會產生任何內容，因此空行將返回像[None]這樣的參數列表。

如果你想解析器跳過一行，你應該返回而不是一個空列表。如果要保留該參數，則應重新編寫函數以返回[]用於跳過的行，並返回[url]（其中url是清理後的行）。

順便說一句...你的第二個正則表達式不匹配空行。它應該讀取'^\s*$'以匹配零或更多空格。

來源

2015-11-10 13:08:12 memoselyk

謝謝，memoselyk。請參閱上述編輯。返回空列表不會從外部列表中刪除該行。感謝您在我的正則表達式中發現錯誤。 –

再次感謝，memoselyk。不幸的是，返回空列表[]不會導致解析器跳過該行。 –

如果您閱讀[argparse源代碼]（http://svn.python.org/projects/python/branches/release27-maint/Lib/argparse.py），應該重寫「convert_arg_line_to_args」的結果並附加到'arg_strings'，這些字符串將遞歸搜索文件前綴。如果返回的列表是空的，那將是一個空操作。 – memoselyk

你實現實際上使用generator，而不是一個功能：使用yield關鍵字時，執行每個yield語句提供的值。即使是一個光禿的yield確實會產生值None。不是提供任何東西或arg，而是返回一個迭代器，提供[None, arg]或[""]（空字符串）。

def convert_arg_line_to_args(self, line): 
    '''Strip out comments from start points file''' 
    if re.match('^#.*', line, 0) or re.match('^\s+$', line, 0): 
     yield # yield None **and proceed** 
    arg = re.sub('\s+#.*$', '', line) 
    yield arg # yield arg

對於初學者來說，你不需要在這裏一臺發電機：不是yield，使用return。請注意，需要可迭代的值 - 無效值的高效迭代例如是空列表[]。

def convert_arg_line_to_args(self, line): 
    '''Strip out comments from start points file''' 
    if re.match('^#.*', line, 0) or re.match('^\s+$', line, 0): 
     return [] # return NO values, **and stop** 
    arg = re.sub('\s+#.*$', '', line) 
    return [arg] # return ONLY arg

這是讓代碼工作的最小修改。

現在，雖然正則表達式適用於這種用例，但它通常是矯枉過正的。 Python的str類具有高效的內置操作和檢查方法：可以刪除註釋，清除空白並查看是否還有其他內容。

def convert_arg_line_to_args(self, line): 
    '''Strip out comments from start points file''' 
    line, *_ = line.split('#', maxsplit=1) # the `*_` consumes any optional comment content 
    arg = line.strip() # remove whitespace - we have just the bare argument now 
    if arg: # is there anything left as an argument? 
     return [arg] # return ONLY arg, and stop 
    return []

如果你想探索發電機與功能，發電機實際上在這裏稍微優雅。我們添加了這些列表，因爲需要一個迭代 - 但是生成器已經是可迭代的了。

這實際上是什麼意思？如果有一個參數，只是yield它 - 它將被「包含」在發生器本身。如果沒有參數，則永不yield - 發電機將停止，但沒有提供任何東西。

def convert_arg_line_to_args(self, line): 
    '''Strip out comments from start points file''' 
    line, *_ = line.split('#', maxsplit=1) 
    arg = line.strip() 
    if arg: 
     yield arg # return arg, but continue... to stop immediately

來源

2017-11-29 12:49:56 MisterMiyagi

Python如何從文件中跳過註釋行ArgumentParser.convert_arg_line_to_args

回答

相關問題