2017-05-22 50 views
0

目標是查看兩列之間的差異,每列包含相同的行條目,僅在不同的行上;Python:創建顯示兩列之間相同行條目移位的輸出

[ 
["ENSCAFG00000008901","ENSCAFG00000013762"], 
["ENSCAFG00000029470","ENSCAFG00000003029"], 
["ENSCAFG00000013782","ENSCAFG00000007249"], 
["ENSCAFG00000000806","ENSCAFG00000012468"], 
["ENSCAFG00000013341","ENSCAFG00000018167"], 
["ENSCAFG00000003376","ENSCAFG00000003376"], 
["ENSCAFG00000000812","ENSCAFG00000018164"], 
["ENSCAFG00000012468","ENSCAFG00000001591"], 
["ENSCAFG00000031786","ENSCAFG00000013782"], 
["ENSCAFG00000000803","ENSCAFG00000030793"], 
["ENSCAFG00000003029","ENSCAFG00000015177"], 
["ENSCAFG00000011565","ENSCAFG00000005750"] 
] 

這個列表有更多的行,所以我不能手工完成。例如,我們在第一列的第11行和第二列的第二行看到ENSCAFG00000003029。我想創建一些輸出,顯示第一列和第二列之間的行相對於第一列的移位。因此,在ENSCAFG00000003029的情況下,我們希望在ENSCAFG00000003029出現在第一列後面的行後面顯示+9(或類似內容)。

我希望我的目標很明確,並且這是提問的正確位置。

回答

0

我想你必須循環兩次。例如,如果你做的簡化假設,在第一列中的條目都是獨一無二的,你可以這樣做:

cols = [['ENSCAFG00000008901', 'ENSCAFG00000013762'], 
     ['ENSCAFG00000029470', 'ENSCAFG00000003029'], 
     ['ENSCAFG00000013782', 'ENSCAFG00000007249'], 
     ['ENSCAFG00000000806', 'ENSCAFG00000012468'], 
     ['ENSCAFG00000013341', 'ENSCAFG00000018167'], 
     ['ENSCAFG00000003376', 'ENSCAFG00000003376'], 
     ['ENSCAFG00000000812', 'ENSCAFG00000018164'], 
     ['ENSCAFG00000012468', 'ENSCAFG00000001591'], 
     ['ENSCAFG00000031786', 'ENSCAFG00000013782'], 
     ['ENSCAFG00000000803', 'ENSCAFG00000030793'], 
     ['ENSCAFG00000003029', 'ENSCAFG00000015177'], 
     ['ENSCAFG00000011565', 'ENSCAFG00000005750']] 
positions = dict((x[0], i) for i, x in enumerate(cols)) 
[positions[x[1]]-i for i, x in enumerate(cols) if x[1] in positions] 

輸出:

[9, 4, 0, -6] 
0
l1 =[] 
l2 =[] 
for item in cols: 
    l1.append(item[0]) 
    l2.append(item[0]) 
for item in l1: 
    if item in l2: 
     print str(l2.index(item)) + ':' + str(item) 
1

它不會非常高效長名單,但這應該工作:

lst = [ 
    ["ENSCAFG00000008901","ENSCAFG00000013762"], 
    ["ENSCAFG00000029470","ENSCAFG00000003029"], 
    ["ENSCAFG00000013782","ENSCAFG00000007249"], 
    ["ENSCAFG00000000806","ENSCAFG00000012468"], 
    ["ENSCAFG00000013341","ENSCAFG00000018167"], 
    ["ENSCAFG00000003376","ENSCAFG00000003376"], 
    ["ENSCAFG00000000812","ENSCAFG00000018164"], 
    ["ENSCAFG00000012468","ENSCAFG00000001591"], 
    ["ENSCAFG00000031786","ENSCAFG00000013782"], 
    ["ENSCAFG00000000803","ENSCAFG00000030793"], 
    ["ENSCAFG00000003029","ENSCAFG00000015177"], 
    ["ENSCAFG00000011565","ENSCAFG00000005750"] 
    ] 

col_1 = [x[0] for x in lst] 
col_2 = [x[1] for x in lst] 

idx_offset = [None] * len(col_1) 
for col_1_idx, val_1 in enumerate(col_1): 
    try: 
     col_2_idx = col_2.index(val_1) 
    except ValueError: 
     continue 
    idx_offset[col_1_idx] = col_2_idx - col_1_idx 

總之,對於第一列中的每個值,找到相同的索引第二列中的值(如果存在)。取這個索引,並從中扣除第一列中的值的索引,這就是你的輸出。輸出(idx_offset)將爲None,用於第一列中不會出現在第二列中的元素。在這種情況下,輸出將變爲:

[None, None, 6, None, None, 0, None, -4, None, None, -9, None] 
+0

這是我想要的輸出,但是,當我輸入這個時,我沒有得到您在文章中描述的輸出(或任何)。此外,是否有一種簡單的方法將輸出附加到它們所屬的行上? – Cheeseburgler

+0

以上的確不會產生輸出。你可以通過追加索引偏移量輕鬆地創建一個原始列表的新副本:'zip(col_1,col_2,idx_offset)'中的outp = [[val_1,val_2,idx](val_1,val_2,idx)]' – acdr

相關問題