2013-02-11 93 views
0

我真的很抱歉,我在用老一套的問題 - AWK和sed使用awk修改文本文件或sed的或兩者(再次)

我想轉換含有大量的文本文件:

>hg19_ct_UserTrack_3545_12513 range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none 
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA 
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC 
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT 
TGTCTGATTCTTTCTGCATACCATGC 
>hg19_ct_UserTrack_3545_13212 range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none 
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC 
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT 
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT 
CCATAAAATAT 

等,等

要:

>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none 
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA 
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC 
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT 
TGTCTGATTCTTTCTGCATACCATGC 
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none 
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC 
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT 
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT 
CCATAAAATAT 

我試過awk 'NR==1{sub(/^[^ ]* /,"")} 1'sed -i '1s/\w\+ //'但沒有任何工作。

+0

那麼,這些多行記錄?我以爲他們只是爲了演示而包裝。如果是這樣,記錄的開始是什麼,類似於** hg19_ct_UserTrack_3545_13212 **?最後的結果只是一個很長的字符串,沒有那些UserTrack字段? – mjuarez 2013-02-11 11:04:51

回答

1

我假設你想刪除以大於號開頭的行中的第一個單詞。在這種情況下,你可以使用awk這樣的:

awk '{sub(/^>[^ ]* /,">")} 1' 

移除限制,NR==1意味着下面的塊將只在第一線執行。還包括>的模式和更換。

輸出:

>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none 
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA 
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC 
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT 
TGTCTGATTCTTTCTGCATACCATGC 
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none 
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC 
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT 
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT 
CCATAAAATAT 
0

看起來你只想刪除第一個字段,直到第一個空格。你可以這樣做:

cut -f2- -d ' ' 
+0

不是真的,檢查行1861223 ...和最後一行...我不知道什麼是規則... :) – Kent 2013-02-11 10:56:32

1

下面是使用sed一個辦法:

sed '/^>/s/[^ ]* />/' file 

結果:

>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none 
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA 
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC 
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT 
TGTCTGATTCTTTCTGCATACCATGC 
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none 
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC 
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT 
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT 
CCATAAAATAT