2016-12-04 42 views
1

我有一本字典ngram_list如下:檢查dict鍵是否是Python中字典中任何其他元素的子字符串?

ngram_list = dict_items([ 
    ('back to back breeding', {'wordcount': 4, 'count': 3}), 
    ('back breeding', {'wordcount': 2, 'count': 5}), 
    ('several consecutive heats', {'wordcount': 3, 'count': 2}), 
    ('how often should', {'wordcount': 3, 'count': 2}), 
    ('often when breeding', {'wordcount': 3, 'count': 1}) 
]) 

我想通過字典來排序,從最短的單詞計數到最大,然後循環列表,如果該鍵是任何其他的子串項,刪除它

預期輸出(子項):

ngram_list = dict_items([ 
    ('several consecutive heats', {'wordcount': 3, 'count': 2}), 
    ('how often should', {'wordcount': 3, 'count': 2}), 
    ('often when breeding', {'wordcount': 3, 'count': 1}), 
    ('back to back breeding', {'wordcount': 4, 'count': 3}) 
]) 
+0

什麼是你的最終預期輸出的字典嗎? – Skycc

+0

@Skycc更新對不起 – Lazhar

+0

所以你想你的輸出作爲字典或像dict.items()返回的元組列表?你需要'OrderedDict'來按順序排序的項目 – Skycc

回答

1

第一濾波器輸入字典擺脫不需要的項目,然後用sorted功能與主要由單詞計數的項目進行排序,並最終建立與OrderedDict

使用簡單的in檢查僅串的字典,可能需要使用regex如果想照顧準確完整的單詞邊界匹配

from collections import OrderedDict 
ngram_dict = { 
    'back to back breeding': {'wordcount': 4, 'count': 3}, 
    'back breeding': {'wordcount': 2, 'count': 5}, 
    'several consecutive heats': {'wordcount': 3, 'count': 2}, 
    'how often should': {'wordcount': 3, 'count': 2}, 
    'often when breeding': {'wordcount': 3, 'count': 1} 
} 

# ngram items with unwanted items filter out 
ngram_filter = [i for i in ngram_dict.items() if not any(i[0] in k and i[0] != k for k in ngram_dict.keys())] 
final_dict = OrderedDict(sorted(ngram_filter, key=lambda x:x[1].get('wordcount'))) 

# final_dict = OrderedDict([('several consecutive heats', {'count': 2, 'wordcount': 3}), ('how often should', {'count': 2, 'wordcount': 3}), ('often when breeding', {'count': 1, 'wordcount': 3}), ('back to back breeding', {'count': 3, 'wordcount': 4})]) 

這一切都可以被安裝到1個襯墊如下

from collections import OrderedDict 
final_dict = OrderedDict( 
sorted((i for i in ngram_dict.items() if not any(i[0] in k and i[0] != k for k in ngram_dict.keys())), 
key=lambda x:x[1].get('wordcount'))) 
相關問題