2012-12-28 20 views
10

好友列表,我的詞典列表:Python。手法配合字典

my_list = 
[ 
{'oranges':'big','apples':'green'}, 
{'oranges':'big','apples':'green','bananas':'fresh'}, 
{'oranges':'big','apples':'red'}, 
{'oranges':'big','apples':'green','bananas':'rotten'} 
] 

我想創建一個新的列表,其中部分會去掉重複的。

在我的情況下,該字典必須予以消除:

{'oranges':'big','apples':'green'} 

,因爲它重複再詞典:

{'oranges':'big','apples':'green','bananas':'fresh'} 
{'oranges':'big','apples':'green','bananas':'rotten'} 

因此,期望的結果:

[ 
{'oranges':'big','apples':'green','bananas':'fresh'}, 
{'oranges':'big','apples':'red'}, 
{'oranges':'big','apples':'green','bananas':'rotten'} 
] 

怎麼辦它?太感謝了!

+1

是你的意思是,如果一個較短的字典是一個較長的字典子集,那麼過濾出來,對不對? –

+0

第一步是決定如何將某些東西標記爲部分重複。這只是密鑰對發生多次? –

+0

@Shawn。是的先生。完全正確! –

回答

3

嘗試以下操作執行

注意,在我的實現,我預分類和選擇只有2個組合,以減少迭代次數。 這將確保關鍵是始終小於或等於在尺寸上與乾草

>>> my_list =[ 
{'oranges':'big','apples':'green'}, 
{'oranges':'big','apples':'green','bananas':'fresh'}, 
{'oranges':'big','apples':'red'}, 
{'oranges':'big','apples':'green','bananas':'rotten'} 
] 

#Create a function remove_dup, name it anything you want 
def remove_dup(lst): 
    #import combinations for itertools, mainly to avoid multiple nested loops 
    from itertools import combinations 
    #Create a generator function dup_gen, name it anything you want 
    def dup_gen(lst): 
     #Now read the dict pairs, remember key is always shorter than hay in length 
     for key, hay in combinations(lst, 2): 
      #if key is in hay then set(key) - set(hay) = empty set 
      if not set(key) - set(hay): 
       #and if key is in hay, yield it 
       yield key 
    #sort the list of dict based on lengths after converting to a item tuple pairs 
    #Handle duplicate elements, thanks to DSM for pointing out this boundary case 
    #remove_dup([{1:2}, {1:2}]) == [] 
    lst = sorted(set(tuple(e.items()) for e in lst), key = len) 
    #Now recreate the dictionary from the set difference of 
    #the original list and the elements generated by dup_gen 
    #Elements generated by dup_gen are the duplicates that needs to be removed 
    return [dict(e) for e in set(lst) - set(dup_gen(lst))] 

remove_dup(my_list) 
[{'apples': 'green', 'oranges': 'big', 'bananas': 'fresh'}, {'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}, {'apples': 'red', 'oranges': 'big'}] 

remove_dup([{1:2}, {1:2}]) 
[{1: 2}] 

remove_dup([{1:2}]) 
[{1: 2}] 

remove_dup([]) 
[] 

remove_dup([{1:2}, {1:3}]) 
[{1: 2}, {1: 3}] 

更快實現

def remove_dup(lst): 
    #sort the list of dict based on lengths after converting to a item tuple pairs 
    #Handle duplicate elements, thanks to DSM for pointing out this boundary case 
    #remove_dup([{1:2}, {1:2}]) == [] 
    lst = sorted(set(tuple(e.items()) for e in lst), key = len) 
     #Generate all the duplicates 
    dups = (key for key, hay in combinations(lst, 2) if not set(key).difference(hay)) 
    #Now recreate the dictionary from the set difference of 
    #the original list and the duplicate elements 
    return [dict(e) for e in set(lst).difference(dups)] 
+1

@MostafaR:{'a':'b','a':'b'}實際上是{' a':'b'}並且通過集合論一個集合是它自己的一個子集 – Abhijit

+1

@MostafaR:'{'a':'b','a':'b'} == {'a':'b' }'。 – Blender

+0

非常感謝,效果很棒! –

2

這裏有一個實現你可以使用: -

>>> my_list = [ 
{'oranges':'big','apples':'green'}, 
{'oranges':'big','apples':'green','bananas':'fresh'}, 
{'oranges':'big','apples':'red'}, 
{'oranges':'big','apples':'green','bananas':'rotten'} 
] 

>>> def is_subset(d1, d2): 
     return all(item in d2.items() for item in d1.items()) 
     # or 
     # return set(d1.items()).issubset(set(d2.items())) 

>>> [d for d in my_list if not any(is_subset(d, d1) for d1 in my_list if d1 != d)] 
[{'apples': 'green', 'oranges': 'big', 'bananas': 'fresh'}, 
{'apples': 'red', 'oranges': 'big'}, 
{'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}] 

對於每個字詞dmy_list: -

any(is_subset(d, d1) for d1 in my_list if d1 != d) 

檢查是否,它是任何其它dictmy_list一個子集。如果返回True,那麼至少有一個字典,其子集爲d。所以,我們拿它的not從列表中排除d

+0

非常感謝,效果很棒! –

1

簡短的回答

def is_subset(d1, d2): 
    # Check if d1 is subset of d2 
    return all(item in d2.items() for item in d1.items()) 

filter(lambda x: len(filter(lambda y: is_subset(x, y), my_list)) == 1, my_list) 
+0

這真的很聰明,你在世界上是怎麼想出來的? – george

+0

你的答案與Rohit的區別不大,只不過你用多個過濾器遮蓋了它 – Abhijit

5

第一個[好,第二,有一些編輯..]我想到的事情是這樣的:

def get_superdicts(dictlist): 
    superdicts = [] 
    for d in sorted(dictlist, key=len, reverse=True): 
     fd = set(d.items()) 
     if not any(fd <= k for k in superdicts): 
      superdicts.append(fd) 
    new_dlist = map(dict, superdicts) 
    return new_dlist 

這給:

>>> a = [{'apples': 'green', 'oranges': 'big'}, {'apples': 'green', 'oranges': 'big', 'bananas': 'fresh'}, {'apples': 'red', 'oranges': 'big'}, {'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}] 
>>> 
>>> get_superdicts(a) 
[{'apples': 'red', 'oranges': 'big'}, 
{'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}, 
{'bananas': 'fresh', 'oranges': 'big', 'apples': 'green'}] 

[原本我在這裏用的是frozenset,以爲我可以做一些巧妙的設置操作,但顯然沒有,我們走來了什麼]

+0

你可以用'fd <= k'替換'fd.issubset(k)'。 – Blender

+0

@Blender:好點,編輯。它仍然覺得應該有一些基於滑動設置的技巧。 – DSM

1

我覺得它有一個更好的時間順序:

def is_subset(a, b): 
    return not set(a) - set(b) 

def remove_extra(my_list): 
    my_list = [d.items() for d in my_list] 
    my_list.sort() 

    result = [] 
    for i in range(len(my_list) - 1): 
     if not is_subset(my_list[i], my_list[i + 1]): 
      result.append(dict(my_list[i])) 
    result.append(dict(my_list[-1])) 

    return result 

print remove_extra([ 
     {'oranges':'big','apples':'green'}, 
     {'oranges':'big','apples':'green','bananas':'fresh'}, 
     {'oranges':'big','apples':'red'}, 
     {'oranges':'big','apples':'green','bananas':'rotten'} 
    ])