2017-09-13 593 views
0

我對這段代碼感到困惑。我有我的testfile.txt如何將兩列的文本文件轉換爲fasta格式

Sclsc1_3349_SS1G_09805T0  TTGCGATCTATGCCGACGTTCCA 
Sclsc1_8695_SS1G_14118T0  ATGGTTTCGGC 
Sclsc1_12154_SS1G_05183T0  ATGGTTTCGGC 
Sclsc1_317_SS1G_00317T0   ATGGTTTCGGC 
Sclsc1_10094_SS1G_03122T0  ATGGTTTCGGC 

我想將這個文件轉換爲這種格式(fasta)如下:

>Sclsc1_3349_SS1G_09805T0 
TTGCGATCTATGCCGACGTTCCA 
>Sclsc1_8695_SS1G_14118T0 
ATGGTTTCGGC 
>Sclsc1_12154_SS1G_05183T0 
ATGGTTTCGGC 
>Sclsc1_317_SS1G_00317T0 
ATGGTTTCGGC 
>Sclsc1_10094_SS1G_03122T0 
ATGGTTTCGGC 

這裏是我的Python代碼(運行,如:python mycode.py testfile.txt outputfile.txt,但它不輸出結果是我想要的。有人可以幫我解決這個代碼嗎?謝謝!

import sys 

#File input 
fileInput = open(sys.argv[1], "r") 

#File output 
fileOutput = open(sys.argv[2], "w") 

#Seq count 
count = 1 ; 

#Loop through each line in the input file 
print "Converting to FASTA..." 
for strLine in fileInput: 

    #Strip the endline character from each input line 
    strLine = strLine.rstrip("\n") 

    #Output the header 
    fileOutput.write("> " + str(count) + "\n") 
    fileOutput.write(strLine + "\n") 

    count = count + 1 
print ("Done.") 

#Close the input and output file 
fileInput.close() 
fileOutput.close() 
+0

是在Linux上? – RomanPerekhrest

+0

@RomanPerekhrest是的 – MAPK

+2

如何縮短命令行單線? – RomanPerekhrest

回答

1

當你在Linux操作系統,這裏是短而快awk的一行代碼:

awk '{ printf ">%s\n%s\n",$1,$2 }' testfile.txt > outputfile.txt 

outputfile.txt內容:

>Sclsc1_3349_SS1G_09805T0 
TTGCGATCTATGCCGACGTTCCA 
>Sclsc1_8695_SS1G_14118T0 
ATGGTTTCGGC 
>Sclsc1_12154_SS1G_05183T0 
ATGGTTTCGGC 
>Sclsc1_317_SS1G_00317T0 
ATGGTTTCGGC 
>Sclsc1_10094_SS1G_03122T0 
ATGGTTTCGGC 
相關問題