我有這樣的文件,等位基因形式計數和刪除純合系
bob NULL 0 A A G G G G G
tom NULL 0 A A A A A A A
sara NULL 0 C C C C T T T
jane NULL 0 failed failed failed failed failed failed failed
我需要計數A/C,C/A,A/T,T/A,A/G,G/A,C/G,G/C,C/T,T/C,T/G,G/T,並刪除所有純合系,所以我的期望輸出看起來像這樣,
bob NULL 0 A A G G G G G G/A
sara NULL 0 C C C C T T T C/T
這是我的嘗試,
fileA = open("myfile.txt",'r')
import re
#fileA.next()
lines=fileA.readlines()
for line in lines:
new_list=re.split(r'\t+',line.strip())
snp_name=new_list[0]
allele=new_list[3:]
failed_count = allele.count('failed')
A_count = allele.count('A')
C_count = allele.count('C')
G_count = allele.count('G')
T_count = allele.count('T')
#A/C OR C/A count
if A_count > 0:
if C_count > 0:
if A_count > C_count:
new_list.append('A/C')
else:
new_list.append('C/A')
#A/T OR T/A count
if T_count > 0:
if A_count > T_count:
new_list.append('A/T')
else:
new_list.append('T/A')
#A/G OR G/A count
if G_count > 0:
if A_count > G_count:
new_list.append('A/G')
else:
new_list.append('G/A')
#C/G OR G/C count
if C_count > 0:
if G_count > 0:
if C_count > G_count:
new_list.append('C/G')
else:
new_list.append('G/C')
#C/T OR T/C count
if T_count > 0:
if C_count > T_count:
new_list.append('C/T')
else:
new_list.append('T/C')
#T/G OR G/T count
if T_count > 0:
if G_count > 0:
if T_count > G_count:
new_list.append('T/G')
else:
new_list.append('G/T')
r=open('allele_counts.txt', 'a')
x='\t'.join(new_list)
x=x+'\n'
r.writelines(x)
fileA.close()
r.close()
你能否告訴我如何提高t他編碼並刪除所有純合子線?
你爲什麼做'str(A_count)'然後'A_count>'0''? – MaTh
是的,你是對的。我將編輯它 – user3224522
你也可以添加'elsif'而不是'if's,因爲如果你做'追加'你不需要檢查每個條件(如果我明白了嗎?) – MaTh