2012-06-13 24 views
1

我有以下兩個數組,我試圖看看invalid_id_arr中的元素是否存在於valid_id_arr中,如果它不存在,那麼我會形成比較數組。但從下面的代碼我看到DIFF陣列['id123', 'id124', 'id125', 'id126', 'id789', 'id666']在下面,我希望可以將輸出爲["id789","id666"]我在做什麼錯在這裏python從數組中得到差異

tag_file= {} 
tag_file['invalid_id_arr']=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
tag_file['valid_id_arr']=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"] 
diff = [ele.split('-')[0] for ele in tag_file['invalid_id_arr'] if str(ele.split('-')[0]) not in tag_file['valid_id_arr']] 

電流輸出:

['id123', 'id124', 'id125', 'id126', 'id789', 'id666'] 

預計ouptut:

["id789","id666"] 
+0

你只需要剛過''id''檢查的價值? –

+1

檢出集合,如果你清理你的數據你可以設置(a).difference(set(b))。 – monkut

回答

4

使用一組更有效,但是您的主要問題是您沒有刪除valid_id_arr中第二部分元素。

invalid_id_arr=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
valid_id_arr=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"] 
valid_id_set = set(ele.split('-')[0] for ele in valid_id_arr) 
diff = [ele for ele in invalid_id_arr if ele.split('-')[0] not in valid_id_set] 
print diff 

輸出:

['id789-123', 'id666'] 

http://ideone.com/Q9JBw

+0

我教過python解釋器會處理它.. id1 in valid_arr >>真 – Rajeev

+0

不,(x in arr)是True當且僅當x == arr [i]對於任何i。請參閱本節中表格的第一行:http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange –

+0

a = [「abc 」 1,2,3,4,5]; 1 in True .... – Rajeev

3

嘗試sets

invalid_id_arr = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
valid_id_arr = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"] 

set_invalid = set(x.split('-')[0] for x in invalid_id_arr) 
print set_invalid.difference(x.split('-')[0] for x in valid_id_arr) 
0
>>> a = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
    >>> b = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"] 
    >>> c = (s.split('-')[0] for s in b) 
    >>> [ele.split('-')[0] for ele in a if str(ele.split('-')[0]) not in c] 

     ['id789', 'id666'] 
    >>> 
+0

我試圖避免兩個for循環 – Rajeev

+0

O(n * log(N))使用集合,O(n * n)使用您的方法,但您的解決方案可以很好地工作! – kaitian521