如何以更高效和pythonic的方式編寫以下代碼？

我有一個網址列表：file_url_list，打印到這一點：如何以更高效和pythonic的方式編寫以下代碼？

www.latimes.com, www.facebook.com, affinitweet.com, ...

和頂部1M的URL的另一個列表：top_url_list，打印到這一點：

[1, google.com], [2, www.google.com], [3, microsoft.com], ...

我想找到file_url_list中有多少個網址在top_url_list。我寫了下面的代碼，但我知道這不是最快的方法，也不是最Python的方法。

# Find the common occurrences 
found = [] 
for file_item in file_url_list: 
    for top_item in top_url_list: 
     if file_item == top_item[1]: 
      # When you find an occurrence, put it in a list 
      found.append(top_item)

我怎樣才能以更高效和pythonic的方式來寫？

來源

2017-04-27 Aventinus

爲什麼你存儲一個計數器作爲列表的第一個元素？這實際上使事情變得複雜。有沒有理由這樣做？ –

如果目標是「找到top_url_list中有多少個URL」，爲什麼你不計算任何東西？你有什麼特別的原因讓你將它們追加到列表中？ –

[查找兩個列表的交集？]（http：// stackoverflow。com/questions/642763/find-intersection-of-two-lists） – fafl

交集應該有所幫助。此外，您可以使用生成器表達式僅提取top_url_list中每個條目的url。

file_url_list = ['www.latimes.com', 'www.facebook.com', 'affinitweet.com'] 
top_url_list = [[1, 'google.com'], [2, 'www.google.com'], [3, 'microsoft.com']] 

common_urls = set(file_url_list) & set(url for (index, url) in top_url_list)

或等價感謝Jean-François Fabre：

common_urls = set(file_url_list) & {url for (index, url) in top_url_list}

來源

2017-04-27 08:36:27 Kos

使用set comprehension：'set（url for（index，url）in top_url_list）'=>'{url for（index，url）in top_url_list}' –

謝謝！加入 – Kos

非常快速和優雅。謝謝。 – Aventinus

您可以從第二個列表中獲取URL，然後使用set，因爲Kos已在其答案中顯示，或者您可以將lambda與過濾器一起使用。

top_url_list_flat = [item[1] for item in top_url_list] 
print filter(lambda url: url in file_url_list, top_url_list_flat)

在Python 3 filter返回一個對象，它是迭代的，所以你必須在下面做：

for common in (filter(lambda url: url in file_url_list, top_url_list_flat)): 
    print (common)

Demo

來源

2017-04-27 08:34:51

我想你也知道它不起作用。 –

當您從'top_url_list'中刪除計數器時，此功能起作用 – fafl

但是，在答案的任何地方都沒有提及。 –

你說你想知道從文件的網址是如何在頂部1米列表，而不是他們實際上是。建立了一套較大的列表（我認爲這將是1M），然後通過其他列表重複計數的每個是否爲集：

top_urls = {url for (index, url) in top_url_list} 
total = sum(url in top_urls for url in file_url_list)

如果文件列表是較大的建立從一組相反：

file_urls = set(file_url_list) 
total = sum(url in file_urls for index, url in top_url_list)

sum將數字加在一起。 url in top_urls評估爲bool，或者True或者False。這將分別轉換爲整數1或0。對於sum，url in top_urls for url in file_url_list有效地生成1或0的序列。

也許會更有效（我不得不測試它），你可以過濾且僅當url in top_urls總結1 S：

total = sum(1 for url in file_url_list if url in top_urls)

來源

2017-04-27 11:03:36

如何以更高效和pythonic的方式編寫以下代碼？

回答

相關問題