2015-02-09 70 views
0

我想通過一個文本文件,並創建一個具有關鍵字詞典的數量和他們彈出up.I希望它看起來就有點像這樣時間:以整數形式defaultdict

defaultdict(<type 'int'>, {'keyword1': 1, 'keyword2': 0, 'keyword3': 3, 'keyword4': 9}) 

現在我得到的東西看起來是這樣的:

defaultdict(<type 'int'>, {'keyword1': 1}) 

我可以打印每個關鍵詞在我的字典裏,它遍歷雖然,所以我知道它的嘗試的東西。我也知道更多的這些關鍵字應該彈出,他們應該在文本文件中有實例。我的代碼:

find_it=['keyword1', 'keyword2', 'keyword3', 'keyword4'] 

with open('inputfile.txt', 'r') as f: 
    out = defaultdict(int) 

    for key in find_it: 
     counter=0 
     for line in f: 
      if key in line: 
       out[key] += 1 

my_keys=dict(**out) 

我在這裏錯過了什麼?

回答

1

Joran是正確的,一個Counter是你正在做的事情比defaultdict更適合。下面是一個替代的解決方案:

inputfile.txt

The Zen of Python, by Tim Peters 

Beautiful is better than ugly. 
Explicit is better than implicit. 
Simple is better than complex. 
Complex is better than complicated. 
Flat is better than nested. 
Sparse is better than dense. 
Readability counts. 
Special cases aren't special enough to break the rules. 
Although practicality beats purity. 
Errors should never pass silently. 
Unless explicitly silenced. 
In the face of ambiguity, refuse the temptation to guess. 
There should be one-- and preferably only one --obvious way to do it. 
Although that way may not be obvious at first unless you're Dutch. 
Now is better than never. 
Although never is often better than *right* now. 
If the implementation is hard to explain, it's a bad idea. 
If the implementation is easy to explain, it may be a good idea. 
Namespaces are one honking great idea -- let's do more of those! 

count.py

from collections import Counter 

find_it = {"be", "do", "of", "the", "to"} 

keys = Counter() 

with open("inputfile.txt") as f: 
    for line in f: 
     matches = Counter(w for w in line.split() if w in find_it) 
     keys += matches 

print(keys) 
$ python count.py 
Counter({'the': 5, 'to': 5, 'be': 3, 'of': 3, 'do': 2}) 

此發現在每一行對find_it匹配的數量,並將它們添加到運行計數器keys隨着它的發展。

編輯:正如Blckknght在評論中指出的那樣,以前的解決方案錯過了一個關鍵字在一行中多次出現的情況。編輯後的代碼版本使用與以前稍微不同的方法來解決該問題。

+0

它值得注意的是這會計算每個單詞出現的行數(如果問題有效,問題代碼也會計算在內)。這可能是也可能不是理想的.Joran Beasley的代碼,相比之下,將統計每個單詞的出現次數,而不管它們出現在哪條線上(所以像''keyword1 keyword2 keyword1'''這樣的行會增加' 「關鍵字1」由兩個)。 – Blckknght 2015-02-09 22:00:15

+0

@Blckknght好趕上!現在修復:-) – 2015-02-09 22:23:03

3
from collections import Counter 
my_current_count = Counter(open('inputfile.txt').read().split()) 

應該做的......和更簡單的

for shared_key in set(my_current_count).intersection(my_list_of_keywords): 
    print my_current_count[shared_key] 
在當前狀態下

有太多的事情要做你原來的方法,使其工作,仍然是識別

+0

這也看起來很有趣,我將研究這種方法。謝謝 – 2015-02-09 20:39:23

3

你已經在for key in find_it:的第一次迭代中讀取文件中的所有內容,因此對於下一個鍵,沒有任何可讀的內容。

我建議你交換這些for循環。

with open('inputfile.txt', 'r') as f: 
    out = defaultdict(int) 

    for line in f: 
     for key in find_it: 
      if key in line.strip().split(' '): 
       out[key] += 1 

順便說一句,我強烈推薦你去與Joran Beasley's一個在線解決方案,因爲它更容易閱讀和理解的人誰都會看在未來你的代碼。

+0

不應該通過數組來找到它嗎? – 2015-02-09 20:29:57

+0

它會遍歷每一行中的每一個單詞,看看它們是否在'find_it'中 – ozgur 2015-02-09 20:32:45

+0

哦,我看到了,因爲一旦你經歷了一條線,你就無法回到它與他們關鍵字的方式我沒有!謝謝! – 2015-02-09 20:33:01