2014-07-16 35 views
0

我有這樣一個FASTA文件:在一個FASTA文件搜索AWK腳本

>gnl|SRA|SRR035294.8571.2 FIHSSUW01ASCWS.2 length=224 
GAGATGAAATAGATCTTGGCATATATGTACATGCTTGATCTCAGTTTTGATTGGATTTTATCCATTTTAG 
CTATCTTAACTATTAATCTTGAAATGAAGCTTTAATTTATGTAGGAAGTTTATGAAATTTAGGAAAAAAA 
AAGAAAAAAACAAAACAATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGGCAAGGCA 
CACAGGGGATAGGN 
>gnl|SRA|SRR035294.8572.2 FIHSSUW01ETZME.2 length=254 
ACTAACCAGGTGGTAAACAACTACTACAGGCCAGATTTGAAGAAGGCTGCTCTTGCTAGATTGAGTGCAG 
TGAACAGAAGCCTTAAGGTTTCAAAGTCTGGTGTGAAGAAGAAGAACAGACAGGCAGTTAGGATCCATGG 
TAGGAAGTGAAGCTGTGATTTGCCTACCGTCTGATATTCATCGTATCACTTTCTAGCTGTTCCGTCTTGT 
TTGGCAAGTGTTTGGTTTTACGTGCGAGTAGTTATATGTTGCGC 
>gnl|SRA|SRR035294.8573.2 FIHSSUW01AZA99.2 length=230 
AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGGATGTACCAATTCAAAAAGAAAACAGCAGTT 
GGGGGCAAAACAATTAAGTTGTAACGAATGCATATATATGATTAATCTTCTAACACATTATTTTTGTCTC 
AAAAAAAAAGAAAAAAAACAAAACATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGG 
CAAGGCACACAGGGGATAGG 
>gnl|SRA|SRR035294.8574.2 FIHSSUW01EHI3P.2 length=153 
TGCAAGTTTACAACTTAAAACAACTTTTCTCACAGTGAACAATAAATTTATCAATTCTCATGCAAAAAAA 
AAGAAAAAAACAAAAACATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGGCAAGGCA 
CACAGGGGATAGG 
>gnl|SRA|SRR035294.8575.2 FIHSSUW01EWK4S.2 length=287 
AACAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGAGATTACAGGTATTGCAAGTTTCAAGCCTGTC 
ATAAAGACTCAAAGCCGCTTGTAATTTGTGTTTCCTAGTTGGGGAAGCTGTTTGTTCTTTATTGTGCTAT 
ATGTATTTATTTGAAAGTTTGGATGAACTCAATAAATAAAAGAAAATCTTCATTGTGGGTTACAATTTGG 
ACATGAACATGCATGAATAATGTACCAATTTAGCAAAAAAAAAGAAAAAAACAAAAAACAAATAGTCGGC 
CGGCCCG 
>gnl|SRA|SRR035294.8576.2 FIHSSUW01C911A.2 length=265 
TATTCTCAGGTACGAAATATGAGTTTGCTGATAAATTGATGGATTGGGAATCAGCCTGCATAATAAGATA 
TTCCCAATTAACTTTGCCCGTTAGTTCTTTTAGCTTTTCCTTTAAAGGCACGAGTCTTTCAACCAAAACA 
TTACAGCAAAGTCTAACTGCCTCACAGCTTGCTTCAGAAGTTGTACCCCCGGCCGTAATGGCCACTCTGC 
GTTGATACCACTGCTTCTGAGACACGCAACAGGGGATAGGCAAGGCACACAGGGG 

,我已經在bash寫這個劇本

STRING=$1 
FILE=$(pwd)"/"$2 

if [ -z "$STRING" ] 
then 
    echo "Usage: fastaFind.sh <query> <fasta file>" 
else 
    echo "" 
    awk 'BEGIN { RS = ">" } ; $0 ~ "'$STRING'" { print $0 }' "$FILE" 
fi 

我運行此命令

fastaFind.sh "gnl|SRA|SRR035294.8573.2 FIHSSUW01AZA99.2 length=230" file.fasta 

但它爲未終止的字符串返回一個錯誤。我想要實現的是在命令執行後檢索查詢的特定順序。例如

>gnl|SRA|SRR035294.8573.2 FIHSSUW01AZA99.2 length=230 
AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGGATGTACCAATTCAAAAAGAAAACAGCAGTT 
GGGGGCAAAACAATTAAGTTGTAACGAATGCATATATATGATTAATCTTCTAACACATTATTTTTGTCTC 
AAAAAAAAAGAAAAAAAACAAAACATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGG 

回答

1

awk命令將更好地爲:

awk 'BEGIN{ ORS = ""; RS = ">"; FS="\n" } $1 == "pattern" { print ">" $0 }' file 

或者

awk -v p="pattern" 'BEGIN {ORS = ""; RS = ">"; FS = "\n" } $1 == p { print ">" $0 }' file 

而且你的shell腳本是:

#!/bin/bash 

STRING=$1 
FILE=$2 

if [[ -z $STRING ]]; then 
    echo "Usage: fastaFind.sh <query> <fasta file>" 
else 
    awk -v p="$STRING" 'BEGIN{ ORS=""; RS=">"; FS="\n" } $1 == p { print ">" $0 }' "$FILE" 
fi 

實例:

bash temp.sh 'gnl|SRA|SRR035294.8575.2 FIHSSUW01EWK4S.2 length=287' temp.txt 

輸出:

>gnl|SRA|SRR035294.8575.2 FIHSSUW01EWK4S.2 length=287 
AACAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGAGATTACAGGTATTGCAAGTTTCAAGCCTGTC 
ATAAAGACTCAAAGCCGCTTGTAATTTGTGTTTCCTAGTTGGGGAAGCTGTTTGTTCTTTATTGTGCTAT 
ATGTATTTATTTGAAAGTTTGGATGAACTCAATAAATAAAAGAAAATCTTCATTGTGGGTTACAATTTGG 
ACATGAACATGCATGAATAATGTACCAATTTAGCAAAAAAAAAGAAAAAAACAAAAAACAAATAGTCGGC 
CGGCCCG 
1

有幾個問題需要解決。

  1. 引用是錯誤的。由於您在awk調用中正在評估shell的STRING變量,所以整個awk命令必須用雙引號括起來。但那麼你必須在命令awk命令內部雙引號
  2. 匹配運算符~不能使用,因爲該模式包含字符(如|),它們在正則表達式中有特殊含義。因此,您需要一種方法來匹配記錄的一部分;這是$1(通過重新定義FS而成爲可能)的比較背後的原因。

STRING=$1 
FILE=$(pwd)"/"$2 

if [ -z "$STRING" ] 
then 
    echo "Usage: fastaFind.sh <query> <fasta file>" 
else 
    echo "" 
    awk "BEGIN { RS = \">\" ; FS = \"\n\" } ; \$1 == \"$STRING\" { print \$0 }" "$FILE" 
fi 
1

或者只是這樣的事情:

awk -v "RS=>" '/length=254/ { print $0; }' file