2016-08-21 89 views
1

從詞典:如何創建字典從另一個字典如果某些條件滿足

{0: (u'Donald', u'PERSON'), 1: (u'John', u'PERSON'), 2: (u'Trump', u'PERSON'), 14: (u'Barack', u'PERSON'), 15: (u'Obama', u'PERSON'), 17: (u'Michelle', u'PERSON'), 18: (u'Obama', u'PERSON'), 30: (u'Donald', u'PERSON'), 31: (u'Jonh', u'PERSON'), 32: (u'Trump', u'PERSON')} 

我想創建另一個解釋如下:

{u'Donald John Trump': 2, u'Barack Obama':1, u'Michele Obama':1} 

這裏0,1,2和30 ,31,32個鍵正在增加1併發生兩次。每個發生14,15,17,18次。有什麼方法可以創建這樣的字典嗎?

+0

'(u'Obama 'u'PERSON')'是有兩次在你的字典裏,但它不包括在結果? –

+1

@BurhanKhalid我認爲人們按連續的按鍵順序分組,所以'奧巴馬'出現兩次,但用於'奧巴馬'和'米歇爾奧巴馬'。 – Delgan

+0

但是字典鍵14&15和17&18需要先合併,因爲它增加了1。 – KevinOelen

回答

3

我認爲你需要解決的主要問題是通過按照你描述的那樣通過對錶示增加的int序列的鍵進行分組來識別人員。

幸運的是,Python對此有a recipe

from itertools import groupby 
from operator import itemgetter 
from collections import defaultdict 

dct = { 
    0: ('Donald', 'PERSON'), 
    1: ('John', 'PERSON'), 
    2: ('Trump', 'PERSON'), 
    14: ('Barack', 'PERSON'), 
    15: ('Obama', 'PERSON'), 
    17: ('Michelle', 'PERSON'), 
    18: ('Obama', 'PERSON'), 
    30: ('Donald', 'PERSON'), 
    31: ('John', 'PERSON'), 
    32: ('Trump', 'PERSON') 
} 

persons = defaultdict(int) # Used for conveniance 
keys = sorted(dct.keys()) # So groupby() can recognize sequences 

for k, g in groupby(enumerate(keys), lambda d: d[0] - d[1]): 
    ids = map(itemgetter(1), g)    # [0, 1, 2], [14, 15], etc. 
    person = ' '.join(dct[i][0] for i in ids) # "Donald John Trump", "Barack Obama", etc 
    persons[person] += 1 

print(persons) 
# defaultdict(<class 'int'>, 
#  {'Barack Obama': 1, 
#   'Donald John Trump': 2, 
#   'Michelle Obama': 1}) 
+0

令人驚歎!非常感謝 – KevinOelen

2
def add_name(d, consecutive_keys, result): 
    result_key = ' '.join(d[k][0] for k in consecutive_keys) 
    if result_key in result: 
     result[result_key] += 1 
    else: 
     result[result_key] = 1 

d = {0: (u'Donald', u'PERSON'), 1: (u'John', u'PERSON'), 2: (u'Trump', u'PERSON'), 
    14: (u'Barack', u'PERSON'), 15: (u'Obama', u'PERSON'), 
    17: (u'Michelle', u'PERSON'), 18: (u'Obama', u'PERSON'), 
    30: (u'Donald', u'PERSON'), 31: (u'John', u'PERSON'), 32: (u'Trump', u'PERSON')} 

sorted_keys = sorted(d.keys()) 
last_key = sorted_keys[0] 
consecutive_keys = [last_key] 
result = {} 
for i in sorted_keys[1:]: 
    if i == last_key + 1: 
     consecutive_keys.append(i) 
    else: 
     add_name(d, consecutive_keys, result) 
     consecutive_keys = [i]   
    last_key = i 
add_name(d, consecutive_keys, result) 

print(result) 

輸出

{'Donald John Trump': 2, 'Barack Obama': 1, 'Michelle Obama': 1} 
+0

這也適用!謝謝! – KevinOelen

相關問題