刪除第一個標籤和最後一個分號之間的一切

我有它的線是那樣的文件：刪除第一個標籤和最後一個分號之間的一切

EF457507|S000834932  Root;Bacteria;"Acidobacteria";Acidobacteria_Gp4;Gp4 
EF457374|S000834799  Root;Bacteria;"Acidobacteria";Acidobacteria_Gp14;Gp14 
AJ133184|S000323093  Root;Bacteria;Cyanobacteria/Chloroplast;Cyanobacteria;Family I;GpI 
DQ490004|S000686022  Root;Bacteria;"Armatimonadetes";Armatimonadetes_gp7 
AF268998|S000340459  Root;Bacteria;TM7;TM7_genera_incertae_sedis

我想打印第一個標籤和最後一個分號之間的任何事情，就像

EF457507|S000834932  Gp4 
EF457374|S000834799  Gp14 
AJ133184|S000323093  GpI 
DQ490004|S000686022  Armatimonadetes_gp7 
AF268998|S000340459  TM7_genera_incertae_sedis

我試圖使用正則表達式，但它不工作，有沒有辦法使用Linux，awk或Perl做到這一點？

來源

2012-12-20 Bioinfoguy

當然有。你有什麼嘗試，什麼具體不符合你的嘗試？ – mpe

你可以使用sed：

sed 's/\t.*;/\t/' file 

## This matches a tab character '\t'; followed by any character '.' any number of 
## times '*'; followed by a semicolon; and; replaces all of this with a tab 
## character '\t'. 

sed 's/[^\t]*;//' file 

## Things inside square brackets become a character class. For example, '[0-9]' 
## is a character class. Obviously, this would match any digit between zero and 
## nine. However, when the first character in the character class is a '^', the 
## character class becomes negated. So '[^\t]*;' means match anything not a tab 
## character any number of times followed by a semicolon.

或者awk：

awk 'BEGIN { FS=OFS="\t" } { sub(/.*;/,"",$2) }1' file 

awk '{ sub(/[^\t]*;/,"") }1' file

結果：

EF457507|S000834932  Gp4 
EF457374|S000834799  Gp14 
AJ133184|S000323093  GpI 
DQ490004|S000686022  Armatimonadetes_gp7 
AF268998|S000340459  TM7_genera_incertae_sedis

按照下面的評論，以「刪除一切AFTE R中的最後一個分號」，與sed：

sed 's/[^;]*$//' file 

## '[^;]*$' will match anything not a semicolon any number of times anchored to 
## the end of the line.

或者awk：

awk 'BEGIN { FS=OFS="\t" } { sub(/[^;]*$/,"",$2) }1' file 

awk '{ sub(/[^;]*$/,"") }1' file

來源

2012-12-20 14:22:09 Steve

謝謝史蒂夫..它的工作完美！ – Bioinfoguy

@Bioinfoguy，別忘了接受這個答案。 – RobEarl

如果我想要做對比，如果在最後一個分號後刪除所有內容，該怎麼辦？ – Bioinfoguy

刪除第一個標籤和最後一個分號之間的一切

回答

相關問題