2015-05-17 128 views
3

我有一個詞典的列表。每個字典都有幾個鍵值和一個任意的(但重要的)鍵值對。例如如何刪除列表中的重複字典,忽略字典鍵?

thelist = [ 
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"} 
] 

我想刪除重複的字典,以便只忽略非「忽略鍵」值。我已經看到了related question - 但它只考慮完全相同的字跡。有沒有辦法刪除幾乎重複,使上述數據變成

thelist = [ 
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"} 
] 

重複的哪一個被忽略並不重要。我怎樣才能做到這一點?

+0

是你的價值觀總是可哈希? – DSM

+0

感謝您的回覆。對不起 - 示例情況並不明確。有多個鍵值對,只有一個鍵可以忽略。 – user4467853

+0

@DSM是的,值始終是可散列的(文本和日期時間對象)。 – user4467853

回答

0

而不是使用一個列表中的字典,你可以使用字典的字典。你的每一個字典的關鍵價值將是主要字典的關鍵。

像這樣:

thedict = {} 

thedict["value1"] = {"ignore_key" : "arb1", ...} 
thedict["value2"] = {"ignore_key" : "arb11", ...} 

因爲字典不允許重複鍵你的問題就不存在了。

1

出發與你原來的列表:

thelist = [ 
    {"key" : "value1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "ignore_key" : "arb11"}, 
    {"key" : "value2", "ignore_key" : "arb113"} 
] 

創建一組,並填充它,而過濾列表。

uniques, theNewList = set(), [] 
for d in theList:] 
    cur = d["key"] # Avoid multiple lookups of the same thing 
    if cur not in uniques: 
     theNewList.append(d) 
    uniques.add(cur) 

最後,重命名名單:

theList = theNewList 
5

保持一組看到值的key並刪除具有相同值的任何字典:

st = set() 

for d in thelist[:]: 
    vals = d["key"],d["k2"] 
    if vals in st: 
     thelist.remove(d) 
    st.add(vals) 
print(thelist) 

[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'}, 
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}] 

如果值總是分組,你可以使用keyvalue來分組並得到每組的第一個字典:

from itertools import groupby 
from operator import itemgetter 
thelist[:] = [next(v) for _, v in groupby(thelist,itemgetter("key","k2"))] 
print(thelist)] 

print(thelist) 
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'}, 
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}] 

或者使用類似於DSM的回答發電機修改原來的列表,而無需複製:

def filt(l): 
    st = set() 
    for d in l: 
     vals = d["key"],d["k2"] 
     if vals not in st: 
      yield d 
     st.add(vals) 


thelist[:] = filt(thelist) 

print(thelist) 

[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'}, 
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}] 

如果你不關心哪個傻瓜被去除了剛剛使用了反轉:

st = set() 

for d in reversed(thelist): 
    vals = d["key"],d["k2"] 
    if vals in st: 
     thelist.remove(d) 
    st.add(vals) 
print(thelist) 

要忽略所有欄ignore_key使用groupby:

from itertools import groupby 

thelist[:] = [next(v) for _, v in groupby(thelist, lambda d: 
       [val for k, val in d.items() if k != "ignore_key"])] 
print(thelist) 
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'}, 
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}] 
2

You c烏爾德硬塞東西放到一兩行,但我認爲這只是清潔編寫一個函數:

def f(seq, ignore_keys): 
    seen = set() 
    for elem in seq: 
     index = frozenset((k,v) for k,v in elem.items() if k not in ignore_keys) 
     if index not in seen: 
      yield elem 
      seen.add(index) 

這給

>>> list(f(thelist, ["ignore_key"])) 
[{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'}, 
{'ignore_key': 'arb11', 'k2': 'va2', 'key': 'value2'}] 

這裏假設你的價值觀是哈希的。 (如果他們不是,那麼相同的代碼將與seen = []seen.append(index)一起使用,雖然它對長列表的性能不佳。)

0

不改變thelist

result = [] 
seen = set() 
thelist = [ 
    {"key" : "value1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "ignore_key" : "arb11"}, 
    {"key" : "value2", "ignore_key" : "arb113"} 
] 

for item in thelist: 
    if item['key'] not in seen: 
     result.append(item) 
     seen.add(item['key']) 

print(result) 
0

創建了一套獨特的價值觀和核對(&更新)認爲:

values = {d['key'] for d in thelist} 
newlist = [] 

for d in thelist: 
    if d['key'] in values: 
     newlist.append(d) 
     values -= {d['key']} 

thelist = newlist 
0

您可以通過使用一個適應accepted answer到鏈接的問題字典,而不是一組刪除重複。

下面首先建立一個臨時詞典的鍵是在每個字典項目的在thelist一個元組除外,其被保存爲與每個這些鍵中的相關聯的值忽略一個。這樣做可以消除重複項,因爲它們將成爲相同的項,但仍保留忽略的項和忽略的值(最後一項或只有一項)。

第二步創建thelist,通過創建由每個鍵的組合以及臨時字典中項目的相關值組成的字典來重新創建thelist

你可以,如果你想這兩個步驟合併成一個完全不可讀的一行...

thelist = [ 
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"} 
] 

IGNORED = "ignore_key" 
temp = dict((tuple(item for item in d.items() if item[0] != IGNORED), 
      (IGNORED, d.get(IGNORED))) for d in thelist) 
thelist = [dict(key + (value,)) for key, value in temp.iteritems()] 

for item in thelist: 
    print item 

輸出:

{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'} 
{'ignore_key': 'arb113', 'k2': 'va2', 'key': 'value2'}