在Python中使用Intersection比較兩組數據

比較兩個集合following_id和follower_id時，返回結果似乎是將所有內容分開。在Python中使用Intersection比較兩組數據

import re 
id1 = '[User(ID=1234567890, ScreenName=RandomNameHere), User(ID=233323490,  ScreenName=AnotherRandomName), User(ID=4459284, ScreenName=YetAnotherName)]' 
id2 = '[User(ID=1234467890, ScreenName=sdf), User(ID=233323490, ScreenName=AnotherRandomName), User(ID=342, ScreenName=443)]' 

following_id = ', '.join(re.findall(r'ID=(\d+)', id1)) 
follower_id = ', '.join(re.findall(r'ID=(\d+)', id2)) 

a = list(set(following_id).intersection(follower_id)) 
print a

這導致與[' ', ',', '1', '0', '3', '2', '5', '4', '7', '6', '9', '8']

我想結果是['233323490','54321']這是兩組之間匹配的兩個ID。

對我來說，以下工作：

list1 = [1234567890, 233323490, 4459284, 230, 200, 234, 200, 0002] 
list2 = [1234467890, 233323490, 342, 101, 234] 
a = list(set(list1).intersection(list2)) 
print a

隨着[233323490, 234]

結果這是否有與數據類型做了following_id和follower_id？

來源

2016-11-08 New

''。'。join（）'會返回一個單一的字符串。也許如果你在'following_id'和'follower_id'的定義中刪除它，畢竟你需要這兩個列表來找到交集？ 're.findall（）'已經返回列表 – TuanDT

@ Tuan333有道理，謝謝你的快速回復。 – New

這是因爲你讓strings與.join，不lists：

following_id = ', '.join(re.findall(r'ID=(\d+)', id1)) 
follower_id = ', '.join(re.findall(r'ID=(\d+)', id2)) 
print(following_id) # '1234567890, 233323490, 4459284' 
print(follower_id) # '1234467890, 233323490, 342'

你只需要使用：

following_id = re.findall(r'ID=(\d+)', id1) 
follower_id = re.findall(r'ID=(\d+)', id2)

由於re.findall已經返回匹配的list。

來源

2016-11-08 04:14:30 Darkstarone

快速響應！這就說得通了。謝謝。 – New

following_id和follower_id是字符串。當你將字符串轉換爲一組，你會得到一個集合中的每個字符的：

>>> set('hello, there') 
{' ', 'o', 't', 'e', 'r', 'h', ',', 'l'}

當進行設定，Python不關心你的字符串中的逗號或空格...它只是迭代字符，將每個字符視爲新集合中的項目。

您正在尋找一組字符串。所以你需要傳遞一些包含字符串的東西然後變成一個集合。 re.findall應該給你一個字符串列表。如果你不把它們加在一起，你應該能夠走到十字路口並得到你要找的東西：

following_id = re.findall(r'ID=(\d+)', id1) 
follower_id = re.findall(r'ID=(\d+)', id2) 

a = list(set(following_id).intersection(follower_id))

來源

2016-11-08 04:18:17

很好的解釋！ – New

在Python中使用Intersection比較兩組數據

回答

相關問題