在字典（Python）的字數統計

我有這樣的代碼，這是我想打開一個指定的文件，然後每次有一個while循環會算吧，最後輸出特定while循環總數文件。我決定將輸入文件轉換爲一個字典，然後創建一個for循環，每次看到一個單詞後跟一個空格時，會在最後打印WHILE_之前爲WHILE_添加+1計數。在字典（Python）的字數統計

然而，這似乎並沒有工作，我很茫然，爲什麼。任何幫助解決這個問題將不勝感激。

這是我此刻的代碼：

WHILE_ = 0 
INPUT_ = input("Enter file or directory: ") 


OPEN_ = open(INPUT_) 
READLINES_ = OPEN_.readlines() 
STRING_ = (str(READLINES_)) 
STRIP_ = STRING_.strip() 
input_str1 = STRIP_.lower() 


dic = dict() 
for w in input_str1.split(): 
    if w in dic.keys(): 
     dic[w] = dic[w]+1 
    else: 
     dic[w] = 1 
DICT_ = (dic) 


for LINE_ in DICT_: 
    if ("while\\n',") in LINE_: 
     WHILE_ += 1 
    elif ('while\\n",') in LINE_: 
     WHILE_ += 1 
    elif ('while ') in LINE_: 
     WHILE_ += 1 

print ("while_loops {0:>12}".format((WHILE_)))

這是輸入文件，我從工作：

'''A trivial test of metrics 
Author: Angus McGurkinshaw 
Date: May 7 2013 
''' 

def silly_function(blah): 
    '''A silly docstring for a silly function''' 
    def nested(): 
     pass 
    print('Hello world', blah + 36 * 14) 
    tot = 0 # This isn't a for statement 
    for i in range(10): 
     tot = tot + i 
     if_im_done = false # Nor is this an if 
    print(tot) 

blah = 3 
while blah > 0: 
    silly_function(blah) 
    blah -= 1 
    while True: 
     if blah < 1000: 
      break

輸出應該是2，但我此刻的代碼版畫0

來源

2013-05-29 user2101517

爲什麼你給你的變量，離奇和醜陋的名字呢？ – abarnert

目前它們只是佔位符 – user2101517

標準庫包括[解析Python代碼的模塊]（http://docs.python.org/3.3/library/ast.html）。 –

這是一個令人難以置信的奇怪的設計。你打電話readlines得到的字符串列表，那麼該列表，這將加入整個事情成一個大的字符串，每一行用逗號連接，並用方括號包圍的報價repr上調用str，然後拆分結果在空間上。我不知道你爲什麼會這樣做。

你離奇的變量名，像DICT_ = (dic)代碼額外無用線等，只會進一步混淆的東西。

但我可以解釋爲什麼它不起作用。試試你做的一切愚蠢後打印出DICT_，你會看到，包括while唯一鍵while和'while。由於這兩種模式都不符合您要查找的任何模式，所以您的計數結果爲0.

同樣值得注意的是，即使模式有多個實例，您也只需將1添加到WHILE_，那麼您的整體計數字典毫無用處。

這將是一個容易得多，如果你不混淆你的字符串，嘗試恢復它們，然後嘗試匹配的錯誤恢復的版本。只需直接做。

，而我在這，我也要去解決一些其他問題，使你的代碼是可讀的，簡單的，不漏的文件，等等。這裏的邏輯的完整實現你試圖用手砍了：

import collections 

filename = input("Enter file: ") 
counts = collections.Counter() 
with open(filename) as f: 
    for line in f: 
     counts.update(line.strip().lower().split()) 
print('while_loops {0:>12}'.format(counts['while']))

當您在樣本輸入運行這個，你得到正確2。並將其擴展爲處理if和for是微不足道的，也是顯而易見的。

但是請注意，有一個在你的邏輯一個嚴重的問題：什麼是像一個關鍵字，但在註釋或字符串中間仍然會得到回升。如果沒有寫出某種代碼去除評論和字符串，那麼這是沒有辦法的。這意味着你會將if和for加1。明顯的剝離方法-line.partition('#')[0]和引用類似的方法不會奏效。首先，它是完全有效的if關鍵字之前，有一個字符串，如"foo" if x else "bar"。其次，你不能以這種方式處理多行字符串。

這些問題以及其他類似問題都是爲什麼您幾乎肯定需要真正的解析器。如果您只是想解析Python代碼，那麼標準庫中的the ast module是執行此操作的明顯方法。如果你想快速寫&各種不同語言的髒解析器，請嘗試pyparsing，這非常好，並附帶一些很好的例子。

這裏有一個簡單的例子：

import ast 

filename = input("Enter file: ") 
with open(filename) as f: 
    tree = ast.parse(f.read()) 
while_loops = sum(1 for node in ast.walk(tree) if isinstance(node, ast.While)) 
print('while_loops {0:>12}'.format(while_loops))

或者，更靈活：

import ast 
import collections 

filename = input("Enter file: ") 
with open(filename) as f: 
    tree = ast.parse(f.read()) 
counts = collections.Counter(type(node).__name__ for node in ast.walk(tree))  
print('while_loops {0:>12}'.format(counts['While'])) 
print('for_loops {0:>14}'.format(counts['For'])) 
print('if_statements {0:>10}'.format(counts['If']))

來源

2013-05-29 01:18:39 abarnert

偉大的答案和使用'ast'模塊的好例子。 –

@JonClements：那麼，我對AST的所有操作都是'walk'和'type（node）'，所以它並沒有真正展現你可以擁有的真正樂趣（例如，[MacroPy]（https ：//github.com/lihaoyi/macropy））。 – abarnert

在字典（Python）的字數統計

回答

相關問題