2014-11-05 12 views
1

我已初步篩選我的文本文件只包含有標識的圖案(在這種情況下它的「TCTGTACTATATTG」)這些線路一起將其刪除。現在從生成的文件中,我想從包含它的每行刪除這個模式,以及上游字符。 用AWK做這件事的最好方法是什麼?AWK - 找到線條圖案,並與上游部分

這是我輸入:

@DGTKZQN1:384:C364AACXX:1:1109:19757:66886 2:N:0:GTGAAA 
AACAGTTTCTGTACTATATTGACTCATAAGAGTGGTTTAATACGAAGGGAGGAGAAGTTTCCTGGAAATAATCGATTTCCTAGCTTTTAGTTGCAATAAT 
+ 
[email protected]JJIEHGIEHHHEDFFFDEEEDDEDDCCDBDDDCDDD 
@DGTKZQN1:384:C364AACXX:1:1109:20360:66756 2:N:0:GTGAAA 
TTTCTGTACTATATTGGGTGTGAGAAGTAATGGTGCACTCCACAGACCTCCAGTGGCTGCTTGTTCGCCAGAACAGCAAATTTCTGCAGAAGCGCAAAAG 
+ 
@@CFFFFFHHHGHIIIJI;GCGGIIIJFHIIJGEDGGIJIICBDFIIIIJHIIGHIDHGEEHGHHIIJHGD?DDFEECEDDDDCDCCDDDCDDDDDDBC> 
@DGTKZQN1:384:C364AACXX:1:1109:21207:66784 2:N:0:GTGAAA 
AACAGTTTCTGTACTATATTGTACGTTGTGGATTATTAAAGGGAATAAAAGTGGTAGATTGTGCAGTTGAGGCAGGCTCTCAACTGTGAAACAGCGGTGG 
+ 
@@CFFBDDFHBDCGG<?:[email protected]<?<3C>[email protected][email protected]>?0909??DF>[email protected]=)8CEH9DHCB:AED>[email protected]>C;6>[email protected]= 
@DGTKZQN1:384:C364AACXX:1:1109:21026:66836 2:N:0:GTGAAA 
AGAACAGTTTCTGTACTATATTGTTATACTTCTGTTGTGGGTGTAGAGTTTTCTCCGGCGTTGGCTTCAATGGAATAAGGCACGAGATGAATCCGTGGAG 
+ 
@@@[email protected]:[email protected]@CEACEEEDDDCCCDDBDDDDDDDACDB??>BD 

輸出應該是這樣的:

@DGTKZQN1:384:C364AACXX:1:1109:19757:66886 2:N:0:GTGAAA 
ACTCATAAGAGTGGTTTAATACGAAGGGAGGAGAAGTTTCCTGGAAATAATCGATTTCCTAGCTTTTAGTTGCAATAAT 
+ 
[email protected]JJIEHGIEHHHEDFFFDEEEDDEDDCCDBDDDCDDD 
@DGTKZQN1:384:C364AACXX:1:1109:20360:66756 2:N:0:GTGAAA 
GGTGTGAGAAGTAATGGTGCACTCCACAGACCTCCAGTGGCTGCTTGTTCGCCAGAACAGCAAATTTCTGCAGAAGCGCAAAAG 
+ 
@@CFFFFFHHHGHIIIJI;GCGGIIIJFHIIJGEDGGIJIICBDFIIIIJHIIGHIDHGEEHGHHIIJHGD?DDFEECEDDDDCDCCDDDCDDDDDDBC> 
@DGTKZQN1:384:C364AACXX:1:1109:21207:66784 2:N:0:GTGAAA 
TACGTTGTGGATTATTAAAGGGAATAAAAGTGGTAGATTGTGCAGTTGAGGCAGGCTCTCAACTGTGAAACAGCGGTGG 
+ 
@@CFFBDDFHBDCGG<?:[email protected]<?<3C>[email protected][email protected]>?0909??DF>[email protected]=)8CEH9DHCB:AED>[email protected]>C;6>[email protected]= 
@DGTKZQN1:384:C364AACXX:1:1109:21026:66836 2:N:0:GTGAAA 
TTATACTTCTGTTGTGGGTGTAGAGTTTTCTCCGGCGTTGGCTTCAATGGAATAAGGCACGAGATGAATCCGTGGAG 
+ 
@@@[email protected]:[email protected]@CEACEEEDDDCCCDDBDDDDDDDACDB??>BD 

我用awk和拆分功能已經嘗試過,但我用字符串作爲奮鬥字段分隔符。

+0

你想要的結果/輸出是什麼? – Kent 2014-11-05 10:23:21

回答

1

貌似簡單sed應該爲你工作:

sed -i.bak 's/^.*TCTGTACTATATTG//g' file 

用awk:

awk '{gsub(/^.*TCTGTACTATATTG/, "")} 1' file 

但使用SED也讓你受益匪淺在線編輯。

+1

sed似乎不工作,但使用awk給出我想要的 – 2014-11-05 10:43:47

0
sed -i.bak 's/.*TCTGTACTATATTG//g' file