這似乎是一個非常微不足道的問題,但我沒有足夠的經驗使用grep
和echo
來自己回答。我看了here和here沒有成功。同一行上的多個grep輸出
我有一個文件,其開頭像這樣(.gff文件)超過1,000,000行。
NW_007577731.1 RefSeq region 1 3345205 . + . ID=id0;Dbxref=taxon:144197;Name=Unknown;chromosome=Unknown;collection-date=16-Aug-2005;country=USA: Emerald Reef%2C Florida;gbkey=Src;genome=genomic;isolate=25-593;lat-lon=25.6748 N 80.0982 W;mol_type=genomic DNA;sex=male
NW_007577731.1 Gnomon gene 7982 24854 . - . ID=gene0;Dbxref=GeneID:103352799;Name=LOC103352799;gbkey=Gene;gene=LOC103352799;gene_biotype=protein_coding
NW_007577731.1 Gnomon mRNA 7982 24854 . - . ID=rna0;Parent=gene0;Dbxref=GeneID:103352799,Genbank:XM_008279367.1;Name=XM_008279367.1;gbkey=mRNA;gene=LOC103352799;model_evidence=Supporting evidence includes similarity to: 22 Proteins%2C and 73%25 coverage of the annotated genomic feature by RNAseq alignments;product=homer protein homolog 3-like;transcript_id=XM_008279367.1
NW_007577731.1 RefSeq region 1 3345205 . + . ID=id0;Dbxref=taxon:144197;Name=Unknown;chromosome=Unknown;collection-date=16-Aug-2005;country=USA: Emerald Reef%2C Florida;gbkey=Src;genome=genomic;isolate=25-593;lat-lon=25.6748 N 80.0982 W;mol_type=genomic DNA;sex=male
NW_007577731.1 Gnomon gene 7982 24854 . - . ID=gene0;Dbxref=GeneID:103352799;Name=LOC103352799;gbkey=Gene;gene=LOC103352799;gene_biotype=protein_coding
NW_007577731.1 Gnomon mRNA 7982 24854 . - . ID=rna0;Parent=gene0;Dbxref=GeneID:103352799,Genbank:XM_008279367.1;Name=XM_008279367.1;gbkey=mRNA;gene=LOC103352799;model_evidence=Supporting evidence includes similarity to: 22 Proteins%2C and 73%25 coverage of the annotated genomic feature by RNAseq alignments;product=homer protein homolog 3-like;transcript_id=XM_008279367.1
我想到grep上在第三列中含有mRNA
得到這個製表符分隔的輸出(值的字段gene=
,product=
,transcript_id=
)線。
LOC103352799 homer protein homolog 3-like XM_008279367.1
LOC103352799 homer protein homolog 3-like XM_008279367.1
隨着離譜缺乏優美的,我可以分別使用
grep "mRNA\t" myfile.gff|sed s/gene=/@/|cut -f2 -d"@" |cut -f1 -d";"
grep "mRNA\t" myfile.gff|sed s/product=/@/|cut -f2 -d"@" |cut -f1 -d";"
grep "mRNA\t" myfile.gff|sed s/transcript_id=/@/|cut -f2 -d"@" |cut -f1 -d";"
得到3列,但如何可以附加在同一行這3個命令的輸出?我曾嘗試
echo -e "`grep "mRNA\t" myfile.gff|sed s/gene=/@/|cut -f2 -d"@" |cut -f1 -d";"`\t`grep "mRNA\t" myfile.gff|sed s/product=/@/|cut -f2 -d"@" |cut -f1 -d";"`\t`grep "mRNA\t" myfile.gff|sed s/transcript_id=/@/|cut -f2 -d"@" |cut -f1 -d";"`"
但這裏是輸出:
LOC103352799
LOC103352799 homer protein homolog 3-like
homer protein homolog 3-like XM_008279367.1
XM_008279367.1
非常感謝您的幫助!
嘗試「echo -n」(不要追加換行符) – netizen
我得到相同的輸出;) – tlorin