查找python代碼文件中的所有字符串

我想列出我的大型python項目中的所有字符串。查找python代碼文件中的所有字符串

想象不同的可能性在Python中創建一個字符串：

mystring = "hello world" 

mystring = ("hello " 
      "world") 

mystring = "hello " \ 
      "world"

我需要一個工具，在我的項目中的每個字符串輸出「的文件名，行號，字符串」。應該在一行中顯示使用「\」或「（''）」分佈在多行上的字符串。

任何想法如何做到這一點？

來源

2009-02-25 mbrochh

，如果你打算對信息採取行動：「文件名，行號，字符串」然後STDLIB的lib2to3庫可能給你如何重構大規模Python代碼的一些想法，尤其是lib2to3/refactor.py文件。你可能只需要爲它編寫自己的夾具，就完成了。 – jfs 2009-02-25 14:25:01

如果你可以在Python中做到這一點，我建議先看看ast（抽象語法樹）模塊，然後從那裏開始。

來源

2009-02-25 10:54:54 unwind

你在問Python中的I18N工具嗎？

http://docs.python.org/library/gettext.html#internationalizing-your-programs-and-modules

有一個名爲PO-utils有關的（以前XPOT）工具，可以幫助這一點。

http://po-utils.progiciels-bpi.ca/README.html

來源

2009-02-25 11:41:28

Gettext可以幫助你。把你的字符串_(...)結構：

a = _('Test') 
b = a 
c = _('Another text')

然後在shell提示符下運行：

pygettext test.py

你會得到一個messages.pot文件與您需要的信息：

# SOME DESCRIPTIVE TITLE. 
# Copyright (C) YEAR ORGANIZATION 
# FIRST AUTHOR <[email protected]>, YEAR. 
# 
msgid "" 
msgstr "" 
"Project-Id-Version: PACKAGE VERSION\n" 
"POT-Creation-Date: 2009-02-25 08:48+BRT\n" 
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" 
"Last-Translator: FULL NAME <[email protected]>\n" 
"Language-Team: LANGUAGE <[email protected]>\n" 
"MIME-Version: 1.0\n" 
"Content-Type: text/plain; charset=CHARSET\n" 
"Content-Transfer-Encoding: ENCODING\n" 
"Generated-By: pygettext.py 1.5\n" 


#: teste.py:1 
msgid "Test" 
msgstr "" 

#: teste.py:3 
msgid "Another text" 
msgstr ""

來源

2009-02-25 12:00:29 nosklo

我認爲他們試圖找到這些字符串，以便他們可以將_（）放在它們周圍。 – 2009-02-25 12:47:57

你也可以考慮用解析你的代碼

我不知道其他的解決方案，但它確實非常簡單易用。

來源

2009-02-25 12:24:21 fulmicoton

unwind在2.6中使用ast模塊的建議是一個不錯的選擇。（這裏還有2.5中沒有記錄的_ast模塊。）下面是示例代碼

code = """a = 'blah' 
b = '''multi 
line 
string''' 
c = u"spam" 
""" 

import ast 
root = ast.parse(code) 

class ShowStrings(ast.NodeVisitor): 
    def visit_Str(self, node): 
    print "string at", node.lineno, node.col_offset, repr(node.s) 

show_strings = ShowStrings() 
show_strings.visit(root)

問題是多行字符串。如果你運行上述，你會得到。

string at 1 4 'blah' 
string at 4 -1 'multi\nline\nstring' 
string at 5 4 u'spam'

您會發現它不報告多行字符串的開始，只是結束。使用內置的Python工具沒有好的解決方案。

另一種選擇是你可以使用我的'python4ply'模塊。這是一個用於PLY的Python的語法定義，它是一個解析器生成器。下面是如何使用它：

import compiler 
import compiler.visitor 

# from python4ply; requires the ply parser generator 
import python_yacc 

code = """a = 'blah' 
b = '''multi 
line 
string''' 
c = u"spam" 
d = 1 
""" 

tree = python_yacc.parse(code, "<string>") 
#print tree 

class ShowStrings(compiler.visitor.ASTVisitor): 
    def visitConst(self, node): 
     if isinstance(node.value, basestring): 
      print "string at", node.lineno, repr(node.value) 

visitor = ShowStrings() 
compiler.walk(tree, visitor)

從這個輸出是

string at 1 'blah' 
string at 2 'multi\nline\nstring' 
string at 5 u'spam'

有列信息的支持。（有一些主要完整的註釋代碼來支持這一點，但它沒有經過充分測試。）然後再次，我看到你不需要它。這也意味着使用Python的「編譯器」模塊，它比AST模塊笨拙。

儘管如此，使用30-40行代碼，您應該完全按照您的要求。

來源

2009-02-25 12:54:41

這些30-40行代碼當然主要是遍歷項目來查找Python文件的代碼。 – 2009-02-25 12:56:05

看起來很有希望！我會嘗試你的第一個建議（我不需要任何列信息）。如果它的工作原理像我希望我會在這裏發佈最終解決方案... – mbrochh 2009-02-25 15:30:06

Python的包含tokenize模塊也將伎倆。

from __future__ import with_statement 
import sys 
import tokenize 

for filename in sys.argv[1:]: 
    with open(filename) as f: 
     for toktype, tokstr, (lineno, _), _, _ in tokenize.generate_tokens(f.readline): 
      if toktype == tokenize.STRING: 
       strrepr = repr(eval(tokstr)) 
       print filename, lineno, strrepr

來源

2009-03-02 19:58:12 Lenny

查找python代碼文件中的所有字符串

回答

相關問題