2016-03-15 51 views
0

我有一個文件,我試圖用awk刪除()之前的文本,但將文本保留在()中。我也試圖刪除_#之後的空格和文本,然後輸出整行。也許sed是一個更好的選擇,但我不確定如何。awk或sed刪除字符前文件中的文本然後字符後

文件

chr4 100009839 100009851 426_1201_128(ADH5)_1 0 - 
chr4 100006265 100006367 426_1202_128(ADH5)_2 0 - 
chr4 100003125 100003267 426_1203_128(ADH5)_3 0 - 

期望的輸出

chr4 100009839 100009851 ADH5_1 
chr4 100006265 100006367 ADH5_2 
chr4 100003125 100003267 ADH5_3 

AWK

awk -F'()_*' '{print $1,$2,$3,$4}' file 

回答

1
awk -F'[\t()]' '{OFS="\t"; print $1, $2, $3, $5 $6}' file 

輸出:

 
chr4 100009839  100009851  ADH5_1 
chr4 100006265  100006367  ADH5_2 
chr4 100003125  100003267  ADH5_3 
1

使用SED具有取代:

$ sed 's/[^ ]*(\([^)]*\))\(_[^ ]*\).*$/\1\2/' infile 
chr4 100009839 100009851 ADH5_1 
chr4 100006265 100006367 ADH5_2 
chr4 100003125 100003267 ADH5_3 

拆開該正則表達式:

[^ ]*(  # Non-spaces up to and including opening parenthesis 
\(   # Start first capture group 
    [^)]* # Content between parentheses: everything but a closing parenthesis 
\)   # End of first capture group 
)   # Closing parenthesis, not captured 
\(   # Start second capture group 
    _[^ ]* # Underscore and non-spaces, '_1' etc. 
\)   # End of second capture group 
.*$   # Rest of line, not captured 
相關問題