當某些原始值相同時反轉字典

假設我有一本名爲word_counter_dictionary的字典，它以{'word' : number}的形式統計文檔中的字數。例如，「secondly」一詞出現一次，所以鍵/值對將是{'secondly' : 1}。我想製作一個倒排列表，這樣數字就會成爲關鍵詞，而這些關鍵詞將成爲這些關鍵詞的值，因此我可以繪製前25個最常用的單詞。我看到setdefault()函數可能派上用場的地方，但不管我不能使用它，因爲到目前爲止我在課程中我們只覆蓋了get()。當某些原始值相同時反轉字典

inverted_dictionary = {} 
for key in word_counter_dictionary: 
    new_key = word_counter_dictionary[key] 
    inverted_dictionary[new_key] = word_counter_dictionary.get(new_key, '') + str(key) 
    inverted_dictionary

到目前爲止，使用上述方法，它工作正常，直到它達到具有相同值的另一個單詞。例如，單詞"saves"在文檔中也會出現一次，所以Python會添加新的鍵/值對。但它刪除{1 : 'secondly'}與新對，以便只有{1 : 'saves'}在字典中。

所以，底線，我的目標是獲得這個新字典中所有的單詞和它們各自的重複次數，稱爲inverted_dictionary。

來源

2013-11-24 UnworthyToast

你的問題，我想你知道，這是一本字典不能有多個值一個鍵，如數字1。然而，它可能有，作爲唯一的價值，其他價值的_collection_ 。 –

好的，每次嘗試這樣做之前，您都希望查看字典「鍵」。如果這個詞已經存在，那麼就已經增加了計數。沖洗並重復*無窮無盡*。 –

如果所有你想要做的是提取25個最大值的鍵，你不必先創建這個倒序字典。 – keyser

Python字典不允許重複的鍵，所以你不能使用一個簡單的字典來存儲具有相同鍵的多個元素（在你的情況下爲1）。對於你的榜樣，我寧願有一個list爲您倒字典的價值，並在該列表存儲共享出場次數的話，如：

inverted_dictionary = {} 
for key in word_counter_dictionary: 
    new_key = word_counter_dictionary[key] 
    if new_key in inverted_dictionary: 
     inverted_dictionary[new_key].append(key) 
    else: 
     inverted_dictionary[new_key] = [key]

爲了得到25重複最多的話，你應該通過在inverted_dictionary的（排序）鍵循環和存儲的話：

common_words = [] 
for key in sorted(inverted_dictionary.keys(), reverse=True): 
    if len(common_words) < 25: 
     common_words.extend(inverted_dictionary[key]) 
    else: 
     break 

common_words = common_words[:25] # In case there are more than 25 words

來源

2013-11-24 20:52:23

謝謝你完美的工作！現在，我將如何抓取最大的25個鍵，以便我可以繪製它們？除了切片操作之外，我想不出一種辦法，但很顯然，我無法爲字典做到這一點：P – UnworthyToast

只需編輯我的答案即可獲得25個最常用的單詞。 –

你可以做的是轉換價值的單詞列表使用相同的密鑰：

word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2} 

inverted_dictionary = {} 
for key in word_counter_dictionary: 
    new_key = word_counter_dictionary[key] 
    if new_key in inverted_dictionary: 
     inverted_dictionary[new_key].append(str(key)) 
    else: 
     inverted_dictionary[new_key] = [str(key)] 

print inverted_dictionary 

>>> {1: ['first'], 2: ['second', 'fourth'], 3: ['third']}

來源

2013-11-24 20:52:46 Christian

+1爲可行的例子！ –

這裏有一個版本，不支持「反轉」的詞典：

>>> import operator 
>>> A = {'a':10, 'b':843, 'c': 39, 'd': 10} 
>>> B = sorted(A.iteritems(), key=operator.itemgetter(1), reverse=True) 
>>> B 
[('b', 843), ('c', 39), ('a', 10), ('d', 10)]

相反，它由值創建一個排序的列表，從最高到最低。

要獲得前25名，您只需將其切片：B[:25]。

而且這裏有一個方法來獲得分離的鍵和值（將它們放入一個元組的列表之後）：

>>> [x[0] for x in B] 
['b', 'c', 'a', 'd'] 
>>> [x[1] for x in B] 
[843, 39, 10, 10]

或

>>> C, D = zip(*B) 
>>> C 
('b', 'c', 'a', 'd') 
>>> D 
(843, 39, 10, 10)

注意，如果你只是想提取關鍵字或值（而不是兩者），你應該早先這樣做。這只是如何處理元組列表的例子。

來源

2013-11-24 21:27:24 keyser

一個defaultdict是爲這個完美的

word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2} 
from collections import defaultdict 

d = defaultdict(list) 
for key, value in word_counter_dictionary.iteritems(): 
    d[value].append(key) 

print(d)

輸出：

defaultdict(<type 'list'>, {1: ['first'], 2: ['second', 'fourth'], 3: ['third']})

來源

2013-11-24 21:37:03 iruvar

爲了得到一些數據集倒詞典可能不是最好的數據結構的最大元素。

要麼把項目的排序列表（例如假設你想獲得兩個最頻繁的詞）：

word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2} 
counter_word_list = sorted((count, word) for word, count in word_counter_dictionary.items())

結果：

>>> print(counter_word_list[-2:]) 
[(2, 'second'), (3, 'third')]

或者使用Python附帶的電池（heapq.nlargest在本例）：

import heapq, operator 
print(heapq.nlargest(2, word_counter_dictionary.items(), key=operator.itemgetter(1)))

結果：

[('third', 3), ('second', 2)]

來源

2013-11-24 22:49:30 WolframH

當某些原始值相同時反轉字典

回答

相關問題