我有其內部看起來是這樣的文件之間擺脫boundry範圍重疊的:Python的 - 如何讓行
1 33725 36725 ENHANCER0002 1 711760 714760 ENHANCER0003 1 724150 727150 ENHANCER0004 1 725455 728455 ENHANCER0005 1 871280 874410 ENHANCER0006 1 874180 877180 ENHANCER0007 1 900540 903540 ENHANCER0008 1 901475 904475 ENHANCER0009 1 910260 913260 ENHANCER00010 1 933355 936355 ENHANCER00011 1 947660 950660 ENHANCER00012 1 1013530 1016530 ENHANCER00013 . . . 1 2477030 2480030 ENHANCER00043 1 2478160 2481160 ENHANCER00044 1 2478845 2481845 ENHANCER00045
中間兩列是我的下限和上限。就像第3-4行或第5-6行,邊界重疊。我必須以某種方式重塑它,如果邊界重疊,它只會打印最低的下邊界和最高的上邊界。我使用Python這樣的解決方案,這是我的代碼:
def write_line(chr_no,tmp_l,tmp_h,cnt,filename):
filename.write(str(chr_no)+"\t"+str(tmp_l)+"\t"+str(tmp_h)+"\t"+"ENHANCER000"+str(cnt)+"\n")
inf = open("/home/firat/Desktop/Onder_Lab/Kenan/enhancers_bj.bed","r")
outf = open("/home/firat/Desktop/deneme_v3.bed","w")
cnt = 0
tmp_l=0
tmp_h=0
tmp_list = []
for line in inf:
cnt += 1
line = line.split(' ')
current_low = line[1]
current_high = line[2]
previous_low = tmp_l
previous_high = tmp_h
if (int(current_low) <= int(previous_high)):
tmp_list.append(int(current_low))
tmp_list.append(int(current_high))
tmp_list.append(int(previous_low))
tmp_list.append(int(previous_high))
write_line(line[0],min(tmp_list),max(tmp_list),cnt,outf)
tmp_l = min(tmp_list)
tmp_h = max(tmp_list)
tmp_list = []
else:
write_line(line[0], previous_low, previous_high, cnt, outf)
tmp_l= current_low
tmp_h= current_high
雖然我的解決方案看起來有效,輸出是這樣的:
1 27460 30460 ENHANCER0002 1 33725 36725 ENHANCER0003 1 711760 714760 ENHANCER0004 1 724150 728455 ENHANCER0005 1 724150 728455 ENHANCER0006 1 871280 877180 ENHANCER0007 1 871280 877180 ENHANCER0008 1 900540 904475 ENHANCER0009 1 900540 904475 ENHANCER00010 1 910260 913260 ENHANCER00011 1 933355 936355 ENHANCER00012 1 947660 950660 ENHANCER00013 1 1013530 1016530 ENHANCER00014 . . . 1 2477030 2481160 ENHANCER00044 1 2477030 2481845 ENHANCER00045 1 2477030 2481845 ENHANCER00046
作爲注意到,有重複印刷時,有邊界的重疊。還有一些情況下,3條線重疊,就像在底部一樣。預期結果應該是這樣的:
1 27460 30460 ENHANCER0002 1 33725 36725 ENHANCER0003 1 711760 714760 ENHANCER0004 1 724150 728455 ENHANCER0005 1 871280 877180 ENHANCER0006 1 900540 904475 ENHANCER0007 1 910260 913260 ENHANCER0008 . . . 1 2477030 2481845 ENHANCER00046
什麼是錯我的代碼,我怎麼能提高,即使有一個更比2條線重疊它的工作?