2012-10-23 42 views
0

我不得不名單,其中有他們一些共同的要素:算法來比較兩個列表,並得到相同的元素在python

p = [('link1/d/b/c', 'target1/d/b/c'), ('link2/a/g/c', 'target2/a/g/c'), ..., ('linkn/b/b/f', 'targetn/b/b/f')] 

q = [['target1/d/b/c', 'target1', 123, 334], ['targetn/b/b/f', 'targetn', 23, 64], ... ,['targetx/f/f/f', 'targetx', 999, 888]] 

我試着對它們進行比較,找到共同的元素,然後做一些工作與結果:

do_job('target1/d/b/c', 'target1', 123, 334, 'link1/d/b/c') 

現在即時通訊使用簡單,很慢alghortihm:

for item in p: 
    link = item[0] 
    target = item[1] 
    for item2 in q: 
     target2 = item2[0] 
     if target2 == target: 
      do_some_job(...) 

我吼聲,那我需要比較這兩個列表,並獲得創建一個列表將包含所有的元素,如:

pq = [['target1/d/b/c', 'target1', 123, 334, 'link1/d/b/c'], ..., ['targetn/b/b/f', 'targetn', 23, 64, 'linkn/b/b/f']] 

,然後調用do_some_job(pq)與其說這是每次當我發現同一元素

的如何獲得它?

問候

+0

那沒有Python列表。 link1/d/b/c應該是什麼意思? – 2012-10-23 10:15:00

+0

對「target1/d/b/c''等字符串使用引號。 –

回答

5

使用chain()拼合兩個列表,然後用set()intersection()得到共同的元素。

In [78]: from itertools import chain 

In [79]: p 
Out[79]: 
[('link1/d/b/c', 'target1/d/b/c'), 
('link2/a/g/c', 'target2/a/g/c'), 
('linkn/b/b/f', 'targetn/b/b/f')] 

In [80]: q 
Out[80]: 
[['target1/d/b/c', 'target1', 123, 334], 
['targetn/b/b/f', 'targetn', 23, 64], 
['targetx/f/f/f', 'targetx', 999, 888]] 

In [81]: set(chain(*p)).intersection(set(chain(*q))) 
Out[81]: set(['target1/d/b/c', 'targetn/b/b/f']) 

或使用列表理解與短路:

In [86]: [j for i in p for j in i if j in (z for y in q for z in y)] 
Out[86]: ['target1/d/b/c', 'targetn/b/b/f'] 

或使用any()

In [87]: [j for i in p for j in i if any (j==z for y in q for z in y)] 
Out[87]: ['target1/d/b/c', 'targetn/b/b/f'] 

timeit

In [93]: %timeit set(chain(*p)).intersection(set(chain(*q))) 
100000 loops, best of 3: 7.38 us per loop      ## winner 

In [94]: %timeit [j for i in p for j in i if j in (z for y in q for z in y)] 
10000 loops, best of 3: 24.9 us per loop 

In [95]: %timeit [j for i in p for j in i if any (j==z for y in q for z in y)] 
10000 loops, best of 3: 27.4 us per loop 

In [97]: %timeit [x for x in chain(*p) if x in chain(*q)] 
10000 loops, best of 3: 12.6 us per loop 
1

你或許應該使用的字典:

target_to_link = dict((v,k) for (k,v) in p) 
for item in q: 
    args = item + [target_to_link[item[0]] 
    do_some_job(*args) 

target_to_link詞典讓你從你的目標的相應鏈接。只要確保你沒有幾個目標共享相同的鏈接...

for循環,我們剛剛創建的,結合參數args的臨時列表您item(例如,['target1/d/b/c', 'target1', 123, 334])與相應的鏈接,我們使用function(*args)語法...


如果您需要在p循環相反,你可以構建一個字典一樣

target_to_args = dict((k[0],k[1:]) for k in q) 

然後像做

for (link, target) in p: 
    args = [target] + target_to_args[target] + [link] 
    do_some_job(*args) 
0

chain列表理解應該工作:

[x for x in chain(*p) if x in chain(*q)] 
+0

如果你指的是itertools.chain,它會返回一個迭代器,因此不確定「in」會起作用嗎?無論如何,ashwini解決方案的基於集合的方法可能會更快 – iruvar

+0

@cravoori'in'可迭代地工作正常,它與'any()'類似,並且確實存在短路。見http://pastebin.com/scfnXTyY –

+0

@AshwiniChaudhary,它在這個例子中起作用,因爲每次「如果鏈中的x」被評估,一個新的迭代器被創建,這顯然是很昂貴的。迭代器在一次遍歷後耗盡,這意味着迭代器不適合包含檢查。這裏是一個例子,http://pastebin.com/209fFHUn – iruvar

相關問題