2014-02-24 77 views
1

我有一個輸出文件,我想將它轉換爲I文件,我可以提取整個序列。用linux替換序列文件中的空格fasta

輸出是這樣的:(但當時沒有「在前面>並在每行之後有一個進入

">comp2_c0_seq1 len=265 path=[1:0-264] 
GTTTGAATGGTTTGTGGTTCTGCCTTTGACAAACTGATCATAGTGGAATAATAAGGGAAC 
ATGAAGAAATTCCAAGCCCATTGATTTTCTCTTGAGACCAATTAGGTAAAGTCACTCAAA 
ATTTTTGAGAGTGGATGCTCAGAGGTAACACTTTGGCATAGAATTGTTAAATAGCATGCA 
CTTTAATGGAAGAATAGAATCATTAAAGATTGGTTGATAACAAGTCACAGTGTATTTAAC 
CATCATCACAGCAGATGTAGACAGA 

">comp2_c1_seq1 len=203 path=[2794:0-202] 
CAGCAGATGTAGACAGAAATGGCACCACTGCTTATGAAGGAAACTGGAACCCAGAAGCAC 
ACCCTCCCCTCATTCACCATGAGCATCATGAGGAAGAAGAGACCCCACATTCTACAAGCA 
CAAGTAAGCAAGATGGCGGTCGGCAGTTCTGGGTTAGATGAATTAGTAAAGACATTCCAG 
CAATAGGGAAGATTTTGTTTAGA 

">comp6_c0_seq1 len=424 path=[1744:0-423] 
CCAGCTCCTACTCACCAGTCTCTCCGCATGGAGAAGTGGCCGTCATGGTCGACCTGTTCC 
CAAGGGTGGCCTTGTGAGTGCAGGCTCTCCTCACCAGAGCTGAGGGCTTTGTGAACCTCT 
GATGTCAATAGATGCCCCTCATCTTCCAGGAGGACAAAACAGGGCAAAGCAAGACATGGG 
GTGAGAACAGGAGTGCATCAGTGGGGTTCCCCAAGCCTGTGTCAGGTCCGGATCTGGGTG 
GGAGTTCCCTTCTGCGTCATCCAGGCCAGGCGAGTGGGCATCCTCCCTGAGCACCTGTGC 
TTGGGGCTTTGCCTGTGTCAGTCAGGAAGACAGAGTACACGGAAGAGTTACCATTGCTTT 
CAGAGCAAACCTTCCTTTGACATGCATTTAACACAGCACGGAGTGATTGACATGTGTCCT 
TGTG 

">comp7_c0_seq1 len=208 path=[22:0-207] 
GGAAGGACAGCATGTTTTCCATCTCAAAGACAGGAAAGAGTTATCTCTTCCTCTGGGATC 
CATCAGCATCCTGCCTACTCCTGCGTCACAGCACAGATCCTAACTGGCAAAATTATTAAT 
CTCTCTTCCACTGAAATAGATACATCAGACAGATTCCTTTCTGACTGAAACTGTTCTGCT 
GTGAAAGACTAACAACAAAGCAGATGCT 

">comp8_c0_seq1 len=537 path=[1925:0-536] 
TTAATAATTTAATTTTACTTTGAATATGTGTATATAAAATGCCTAATGTGATAAAAGTAG 
AATATGCCTGGTTGAAGGAAACATAGAAAATTGAATTGCCACTGATTTGGCCTTTCCTTC 
ATCTTTCATGGGGAGCCAGAGAGAATCTGGTTCAGAAGACAGACTCTAGAGTCAAGCAGC 
TGGGGTTCAAATCTTGGCAACATTTCAGGGTGATTTTAAAAATATTTAACAGCTGGTAAT 
GCTAGATGTCGACTTGTCAGAATGGATAAAGCCTGACATGACGTATATAGCCACACCAGC 
ATATAATCAGCCCTGTCTCCACCACTTACTAGTAGTGTCTTTATCTGTAAGATAAAGATA 
GCAATAGGCATTATCTCATAGGGGTTTTATGAGGATTAGGTGTAATAATATATATAAAGC 
ACTTATGACAATGTTTGGAAGAAAGTGTCATTCAACATTAGATATCATCATCATTGTCAT 
CATCGTGACTAATACTTGAGGAATTCCAGAATGTTATGGTTAGAATGGTAAAGTTCT 

我希望擁有的是這樣的:

> ">comp2_c0_seq1 len=265 path=[1:0-264] GTTTGAATGGTTTGTGGTTCTGCCTTTGACAAACTGATCATAGTGGAATAATAAGGGAACATGAAGAAATTCCAAGCCCATTGATTTTCTCTTGAGACCAATTAGGTAAAGTCACTCAAAATTTTTGAGAGTGGATGCTCAGAGGTAACACTTTGGCATAGAATTGTTAAATAGCATGCACTTTAATGGAAGAATAGAATCATTAAAGATTGGTTGATAACAAGTCACAGTGTATTTAACCATCATCACAGCAGATGTAGACAGA 

即每個序列是在相同的線上,所以我可以很容易地通過grep的提取序列。希望這是可能的。

感謝

+0

抱歉,但認爲沒有意義。 – suspectus

回答

1

awk應該做的:

awk '{printf (/comp/&&NR>1?"\n":"")"%s",$0}' file 
">comp2_c1_seq1 len=203 path=[2794:0-202]CAGCAGATGTAGACAGAAATGGCACCACTGCTTATGAAGGAAACTGGAACCCAGAAGCACACCCTCCCCTCATTCACCATGAGCATCATGAGGAAGAAGAGACCCCACATTCTACAAGCACAAGTAAGCAAGATGGCGGTCGGCAGTTCTGGGTTAGATGAATTAGTAAAGACATTCCAGCAATAGGGAAGATTTTGTTTAGA 
">comp6_c0_seq1 len=424 path=[1744:0-423]CCAGCTCCTACTCACCAGTCTCTCCGCATGGAGAAGTGGCCGTCATGGTCGACCTGTTCCCAAGGGTGGCCTTGTGAGTGCAGGCTCTCCTCACCAGAGCTGAGGGCTTTGTGAACCTCTGATGTCAATAGATGCCCCTCATCTTCCAGGAGGACAAAACAGGGCAAAGCAAGACATGGGGTGAGAACAGGAGTGCATCAGTGGGGTTCCCCAAGCCTGTGTCAGGTCCGGATCTGGGTGGGAGTTCCCTTCTGCGTCATCCAGGCCAGGCGAGTGGGCATCCTCCCTGAGCACCTGTGCTTGGGGCTTTGCCTGTGTCAGTCAGGAAGACAGAGTACACGGAAGAGTTACCATTGCTTTCAGAGCAAACCTTCCTTTGACATGCATTTAACACAGCACGGAGTGATTGACATGTGTCCTTGTG 
">comp7_c0_seq1 len=208 path=[22:0-207]GGAAGGACAGCATGTTTTCCATCTCAAAGACAGGAAAGAGTTATCTCTTCCTCTGGGATCCATCAGCATCCTGCCTACTCCTGCGTCACAGCACAGATCCTAACTGGCAAAATTATTAATCTCTCTTCCACTGAAATAGATACATCAGACAGATTCCTTTCTGACTGAAACTGTTCTGCTGTGAAAGACTAACAACAAAGCAGATGCT 
">comp8_c0_seq1 len=537 path=[1925:0-536]TTAATAATTTAATTTTACTTTGAATATGTGTATATAAAATGCCTAATGTGATAAAAGTAGAATATGCCTGGTTGAAGGAAACATAGAAAATTGAATTGCCACTGATTTGGCCTTTCCTTCATCTTTCATGGGGAGCCAGAGAGAATCTGGTTCAGAAGACAGACTCTAGAGTCAAGCAGCTGGGGTTCAAATCTTGGCAACATTTCAGGGTGATTTTAAAAATATTTAACAGCTGGTAATGCTAGATGTCGACTTGTCAGAATGGATAAAGCCTGACATGACGTATATAGCCACACCAGCATATAATCAGCCCTGTCTCCACCACTTACTAGTAGTGTCTTTATCTGTAAGATAAAGATAGCAATAGGCATTATCTCATAGGGGTTTTATGAGGATTAGGTGTAATAATATATATAAAGCACTTATGACAATGTTTGGAAGAAAGTGTCATTCAACATTAGATATCATCATCATTGTCATCATCGTGACTAATACTTGAGGAATTCCAGAATGTTATGGTTAGAATGGTAAAGTTCT 
0

你可以試試這個sed

sed '/">comp/{:loop; N; /\n">comp/{P;D}; s/\n//g; b loop;}' yourfile.txt 
相關問題