如何計算代碼中關鍵字的出現次數，但忽略comment/docstring中的出現次數？

我在Python中很新。我想在下面的代碼中找到Python關鍵字['def','in', 'if'...]的出現。但是，代碼中任何字符串常量中的關鍵字都需要忽略。如何計算關鍵字出現次數而不計算字符串中的出現次數？如何計算代碼中關鍵字的出現次數，但忽略comment/docstring中的出現次數？

def grade(result): 
    ''' 
    if if (<--- example to test if the word "if" will be ignored in the counts) 
    :param result: none 
    :return:none 
    ''' 

    if result >= 80: 
     grade = "HD" 
    elif 70 <= result: 
     grade = "DI" 
    elif 60 <= result: 
     grade = "CR" 
    elif 50 <= result: 
     grade = "PA" 
    else: 
    #else (ignore this word) 
     grade = "NN" 
    return grade 

result = float(raw_input("Enter a final result: ")) 

while result < 0 or result > 100: 
    print "Invalid result. Result must be between 0 and 100." 
    result = float(raw_input("Re-enter final result: ")) 

print "The corresponding grade is", grade(result)

來源

2015-05-14 Jason Bui

使用tokenize，keyword和collections模塊。

tokenize.generate_tokens(readline)

的generate_tokens（）發生器需要一個參數，readline的，它必須是一個可調用的對象，該對象提供相同的接口內置的文件對象的ReadLine（）方法（參見部分文件對象）。每次調用函數都會將一行輸入作爲字符串返回。或者，readline可能是一個可調用對象，它通過提高StopIteration來表示完成。

該生成器與這些成員生成5元組：令牌類型; 令牌字符串;指定行的整數的2元組（srow，scol）和令牌在源中開始的列;一個2元組（erow， ecol）整數，用於指定令牌在源中的行和列;和找到令牌的行。該行通過（最後一個元組項）是邏輯行;包括連續行。

版本2.2中的新功能。

import tokenize 
with open('source.py') as f: 
    print list(tokenize.generate_tokens(f.readline))

的部分輸出：

[(1, 'def', (1, 0), (1, 3), 'def grade(result):\n'), 
(1, 'grade', (1, 4), (1, 9), 'def grade(result):\n'), 
(51, '(', (1, 9), (1, 10), 'def grade(result):\n'), 
(1, 'result', (1, 10), (1, 16), 'def grade(result):\n'), 
(51, ')', (1, 16), (1, 17), 'def grade(result):\n'), 
(51, ':', (1, 17), (1, 18), 'def grade(result):\n'), 
(4, '\n', (1, 18), (1, 19), 'def grade(result):\n'), 
(5, ' ', (2, 0), (2, 4), " '''\n"), 
(3, 
    '\'\'\'\n if if (<--- example to test if the word "if" will be ignored in the counts)\n :param result: none\n :return:none\n \'\'\'', 
    (2, 4), 
    (6, 7), 
    ' \'\'\'\n if if (<--- example to test if the word "if" will be ignored in the counts)\n :param result: none\n :return:none\n \'\'\'\n'), 
(4, '\n', (6, 7), (6, 8), " '''\n"), 
(54, '\n', (7, 0), (7, 1), '\n'), 
(1, 'if', (8, 4), (8, 6), ' if result >= 80:\n'),

您可以檢索的關鍵字列表形式模塊keyword：

import keyword 
print keyword.kwlist 
print keyword.iskeyword('def')

集成解決方案與collections.Counter：

import tokenize 
import keyword 
import collections 
with open('source.py') as f: 
    # tokens is lazy generator 
    tokens = (token for _, token, _, _, _ in tokenize.generate_tokens(f.readline)) 
    c = collections.Counter(token for token in tokens if keyword.iskeyword(token)) 

print c # Counter({'elif': 3, 'print': 2, 'return': 1, 'else': 1, 'while': 1, 'or': 1, 'def': 1, 'if': 1})

來源

2015-05-14 08:24:24

哇，我非常抱歉2年前不接受你的回答。當我將我的第一個簡介編程到課程中時，我問了這個問題，並且當時無法理解和實現您的代碼。非常感謝您的回答！ –

如何計算代碼中關鍵字的出現次數，但忽略comment/docstring中的出現次數？

回答

相關問題