解析簡單表格

對於輸入文件中的每一行，我想打印具有字符串'locus_tag ='的字段，如果沒有字段匹配，則打印一個破折號。解析簡單表格

輸入文件（製表符分隔）：

GeneID_2=7277058 location=890211..892127 locus_tag=HAPS_0907 orientation=+ 
GeneID_2=7278144 gene=rlmL location=complement(1992599..1994776) locus_tag=HAPS_2029 
GeneID_2=7278145 gene=rlmT location=complement(1992599..1994776) timetoparse

所需的輸出：

locus_tag=HAPS_0907 
locus_tag=HAPS_2029 
-

是否嘗試過這一點，但不工作：

awk -F'\t' '{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i}; {for(i=1; i<=NF; i++) if($i !=/locus_tag=/) {print "-"}} }' SNP_annotations_ON_PROTEIN

來源

2014-02-19 biotech

你是八九不離十：

$ awk -F'\t' '{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i; next} {print "-"}}' a 
GeneID_2=7277058 location=890211..892127 locus_tag=HAPS_0907 orientation=+ 
GeneID_2=7278144 gene=rlmL location=complement(1992599..1994776) locus_tag=HAPS_2029 
-

你有什麼：

{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i}; {for(i=1; i<=NF; i++) if($i !=/locus_tag=/) {print "-"}} }'

我寫什麼：

{ for(i=1; i<=NF; i++) if($i ~/locus_tag=/) {print $i; next} {print "-"}} 
                 ^^^^ ^^^^^^^^^^^ 
         if found, print and go to next line  | 
    if you arrive here, it is because you did not find the pattern, so print dash

來源

2014-02-19 16:39:51 fedorqui

隨着awk：

awk '/locus_tag/{for(x=1;x<=NF;x++) if($x~/^locus_tag=/) print $x;next}{print "-"}' file

來源

2014-02-19 16:41:41

你可以用FS發揮，使其更容易：

awk -F'locus_tag=' 'NF>1{sub(/\s.*/,"",$2);print FS $2;next}$0="-"' f 
locus_tag=HAPS_0907 
locus_tag=HAPS_2029 
-

來源

2014-02-19 16:42:36 Kent

隨着perl：

perl -ne 'print /(locus_tag=.*?)\s/?"$1\n":"-\n"' file 
locus_tag=HAPS_0907 
locus_tag=HAPS_2029 
-

來源

2014-02-19 16:45:37 MarcoS

是的，對不起......我只是優化小腳本...現在它看起來像最整潔（或至少最短:-)的建議答案... :-) – MarcoS

perl -nE 'say m/(locus_tag=\S*)/ ? $1 : q/-/'

來源

2014-02-19 16:55:50 Axeman

'perl -M5.010 -ne'可以縮短編輯爲'perl -nE'。 '-E'選項啓用所有可選功能（比如說），儘管它沒有啓用strict。 – user49740

@ user49740，感謝您的提示。我還沒有使用'-E'。 – Axeman

perl -lpe '($_)= (/(locus_tag=\S+)/, "-")' file

輸出

locus_tag=HAPS_0907 
locus_tag=HAPS_2029 
-

來源

2014-02-19 16:58:38

+1漂亮!!!!! –

真的很漂亮。但我更喜歡可讀性比美容... :-) – MarcoS

$ awk '{print (match($0,/locus_tag=[^[:space:]]*/) ? substr($0,RSTART,RLENGTH) : "-")}' file 
locus_tag=HAPS_0907 
locus_tag=HAPS_2029 
-

來源

2014-02-19 17:16:01

解析簡單表格

回答

相關問題