2017-06-13 153 views
1

我有兩個列表,每個列表都具有非唯一編號,這意味着它們可以具有多次相同的值。可能重複值的兩個列表之間的Python差異

我需要找到兩者之間的差異,考慮到相同的值可能會出現多次(所以我不能採取每組之間的差異)的事實。所以,我需要檢查一個值是否在第一個列表中出現的次數多於第二個列表中的次數。

的列表是:

l1 = [1, 2, 5, 3, 3, 4, 9, 8, 2] 
l2 = [1, 1, 3, 2, 4, 8, 9] 

# Sorted and justified 
l1 = [1, 2, 2, 3, 3, 4, 5, 8, 9] 
l2 = [1, 1, 2, 3, 4, 8, 9] 

列表中的元件可以是字符串或整數或浮點數。 所以結果列表應該是:

difference(l1, l2) == [3, 5, 2] 
# There is an extra 2 and 3 in l1 that is not in l2, and a 5 in l1 but not l2. 

difference(l2, l1) == [1] 
# The extra 1 is the only value in l2 but not in l1. 

我已經試過列表理解[x for x in l1 if x not in l2]這是不行的,因爲它沒有考慮在這兩個重複的值。

+0

你試過做什麼? – depperm

+0

我試過列表生成器,只有我能想到的這種情況下,沒有建立一個循環函數[x在l1中x,如果x不在l2中]不起作用 – clg4

+0

值是整數,還是你需要更通用的解決方案 –

回答

4

如果訂單重要的是,你可以使用一個Counter(見collections模塊的標準庫):

from collections import Counter 

l1 = [1,2,5,3,3,4,9,8,2] 
l2 = [1,1,3,2,4,8,9] 

c1 = Counter(l1) # Counter({2: 2, 3: 2, 1: 1, 5: 1, 4: 1, 9: 1, 8: 1}) 
c2 = Counter(l2) # Counter({1: 2, 3: 1, 2: 1, 4: 1, 8: 1, 9: 1}) 

diff1 = list((c1-c2).keys()) # [2, 5, 3] 
diff2 = list((c2-c1).keys()) # [1] 

這是相當普遍的,並與琴絃的作品,太:

... 
l1 = ['foo', 'foo', 'bar'] 
l2 = ['foo', 'bar', 'bar', 'baz'] 
... 
# diff1 == ['foo'] 
# diff2 == ['bar', 'baz'] 
2

我有一種感覺,很多人會來這裏爲multiset的差異(例如:[1, 1, 1, 2, 2, 2, 3, 3] - [1, 2, 2] == [1, 1, 2, 3, 3]),所以我也會在這裏發佈該答案:

import collections 

def multiset_difference(a, b): 
    """Compute a - b of two multisets a and b""" 
    a = collections.Counter(a) 
    b = collections.Counter(b) 

    difference = a - b 
    return difference # Remove this line if you want it as a list 

    as_list = [] 
    for item, count in difference.items(): 
     as_list.extend([item] * count) 
    return as_list 

def ordered_multiset_difference(a, b): 
    """As above, but preserves order and is O(ab) worst case""" 
    difference = list(a) 
    depleted = set() # Values that aren't in difference to prevent searching the list again 
    for i in b: 
     if i not in depleted: 
      try: 
       difference.remove(i) 
      except ValueError: 
       depleted.add(i) 
    return difference 
0

使用Counter可能是一個更好的選擇,但要自己把它卷:

def diff(a, b): 
    result = [] 
    cpy = b[:] 
    for ele in a: 
     if ele in cpy: 
      cpy.remove(ele) 
     else: 
      result.append(ele) 
    return result 

或虐待的一行:

def diff(a, b): 
    return [ele for ele in a if ele not in b or b.remove(ele)] 

的一個襯墊的過程中破壞b的差異,所以你可能想通過它一個副本:diff(l1, l2[:]),或使用:

def diff(a, b): 
    cpy = b[:] 
    return [ele for ele in a if ele not in cpy or cpy.remove(ele)] 
+1

'如果ele在cpy:cpy.remove(ele)'掃描整個列表兩次。單線程不能正確處理僞值(例如'diff([0,0,1],[1]) - >'[]')。如果你想要一個簡單的單線程,只需使用列表理解不包含它。 '[ele for ele if in cpy or cpy.remove(ele)]''。但是,這又一次,每次需要刪除時都會重複兩次。 – Artyer

+0

@Artyer我修復了過濾器......我很好奇你是否已經嘗試了你的列表理解改變,因爲它會給我一個'ValueError'。而且,正如我所提到的,Counter可能是一個更好的選擇。 – TemporalWolf

+1

糟糕。我的意思是忘了一個'不'。 '[Ele for ele不是在cpy或cpy.remove(ele)]' – Artyer