我想轉換包含幾個DNA序列成二進制值,其是如下文件:Python:如何使用二進制值編碼DNA序列?
A=1000
C=0100
G=0010
T=0001
FileA.txt
CCGAT
GCTTA
希望的輸出
01000100001010000001
00100100000100011000
我已經嘗試使用此代碼來解決我的問題但是bin輸出文件似乎沒有輸出我想要的答案。誰能幫我?
代碼
import sys
if len(sys.argv) != 2 :
sys.stderr.write('Usage: {} <nucleotide file>\n'.format(sys.argv[0]))
sys.exit()
# assumes the file only contains dna and newlines
sequence = ''
for line in open(sys.argv[1]) :
sequence += line.strip().upper()
sequence = sequence.replace('A', chr(0b1000))
sequence = sequence.replace('C', chr(0b0100))
sequence = sequence.replace('G', chr(0b0010))
sequence = sequence.replace('T', chr(0b0001))
outfile = open(sys.argv[1] + '.bin', 'wb')
outfile.write(bytearray(sequence, encoding = 'utf-8'))
你是否螞蟻實際的二進制文件,或者你想要一個文件的字符串表示'1000','0100',...? – wwii
您可以用'A =「00」將您的編碼字符串切成兩半; C = 「01」; G = 「10」; T =「11」' – PaulMcG