2015-04-30 37 views
4

我正在努力解決以下問題。 想象一下,我有一個像這樣大量的數據:Python:具有最常見條目的數據子集

one = {'A':'m','B':'n','C':'o'} 
two = {'A':'m','B':'n','C':'p'} 
three = {'A':'x','B':'n','C':'p'} 

等,不具備存儲在http://stardict.sourceforge.net/Dictionaries.php下載必然。 我如何獲得最常用條目的數據子集?

在上面的例子中我想獲得

one, two   with same A and B = m,n 
two, three  with same B and C = n,p 
one, two three with same B  = n 
one, two   with same A  = m 

回答

2

的一種方式,但長期的字典沒有更有效的使用itertools.combinations找到你的字典,然後在組合循環,然後套之間的組合拿到項目集合之間的交集:

one = {'one':{'A':'m','B':'n','C':'o'}} 
two ={'two':{'A':'m','B':'n','C':'p'}} 
three = {'three':{'A':'x','B':'n','C':'p'}} 

dict_list=[one,two,three] 
v_item=[i.items() for i in dict_list] 

from itertools import combinations 
names=[] 
items=[] 
l=[combinations(v_item,i) for i in range(2,4)] 
flat=[[[t[0] for t in k] for k in j] for j in l] 
"""this line is for flattening the combinations i don't know why but python puts every elements within a list : 
>>> l 
[[([('one', {'A': 'm', 'C': 'o', 'B': 'n'})], [('two', {'A': 'm', 'C': 'p', 'B': 'n'})]), 
([('one', {'A': 'm', 'C': 'o', 'B': 'n'})], [('three', {'A': 'x', 'C': 'p', 'B': 'n'})]), 
([('two', {'A': 'm', 'C': 'p', 'B': 'n'})], [('three', {'A': 'x', 'C': 'p', 'B': 'n'})])], 
[([('one', {'A': 'm', 'C': 'o', 'B': 'n'})], [('two', {'A': 'm', 'C': 'p', 'B': 'n'})], [('three', {'A': 'x', 'C': 'p', 'B': 'n'})])]]""" 


for comb in flat : 
    for pair in comb: 
    names,items =zip(*pair) 
    items=[i.viewitems() for i in items] 
    print names,reduce(lambda x,y:x&y,items) 

結果:

('one', 'two') set([('B', 'n'), ('A', 'm')]) 
('one', 'three') set([('B', 'n')]) 
('two', 'three') set([('B', 'n'), ('C', 'p')]) 
('one', 'two', 'three') set([('B', 'n')]) 

約在下面幾行:

 items=[i.viewitems() for i in items] 
    print names,reduce(lambda x,y:x&y,items) 

您需要到C reate a view object of your items踹如set對象,那麼你可以計算出項目的交叉口&操作。 使用reduce函數。

+0

UFF,你是對的,這是對大數據集慘遭緩慢,加上運行內存:O:/ –

0

謝謝卡斯拉,這給了我最後的提示:)。
我改變了一些東西,並將其轉換爲Python3(忘記提及...)。
但是,作爲你的代碼,它很殘忍,並且在大型數據集(我確實有)上內存不足。所以我必須尋找另一種方法:/。

這是我的最終代碼:

from itertools import combinations 
from functools import reduce 

class Piece(): 
    def __init__(self,tag,A,B,C): 
     self._tag = tag 
     self.A = A 
     self.B = B 
     self.C = C 

     self._dict = set(self.__dict__.items()) 


pieces = [] 
pieces.append(Piece('one','m','n','o')) 
pieces.append(Piece('two','m','n','p')) 
pieces.append(Piece('three','x','n','p')) 


l=[combinations(pieces,i) for i in range(2,4)] 
flat =[] 
for i in l: 
    for k in i: 
     flat.append(k) 


for f in flat: 
    print('-'*25)  
    print([j._tag for j in f]) 
    dicts = (i._dict for i in f)  
    matches = reduce(lambda x,y : x & y,dicts)  
    print(matches) 
相關問題