遍歷目錄中的文件，創建輸出文件linux

我想遍歷特定目錄（稱爲序列）中的每個文件，並對每個文件執行兩個函數。我知道函數（'blastp'和'cat'行）起作用，因爲我可以在單個文件上運行它們。通常我會有一個特定的文件名作爲查詢，輸出等，但我試圖使用一個變量，因此循環可以通過許多文件工作。我相信我在嘗試在函數中使用我的文件名時遇到了嚴重的問題。事實上，我的代碼將會執行，但它會創建一堆額外的非預期文件。這是我打算爲我的腳本執行的操作：遍歷目錄中的文件，創建輸出文件linux

第1行：遍歷我的「序列」目錄中的每個文件。（所有這些以「.fa」結尾，如果有幫助的話）。

第3行：將文件名識別爲變量。（我知道，我知道，我認爲我做了這個可怕的錯誤。）

第4行：使用文件名作爲參數爲「查詢」標誌運行blastp函數，始終使用「database.faa」作爲「db」標誌的參數，並將結果輸出到與初始文件具有相同名稱，但末尾帶有「.txt」的新文件中。

第5行：將第4行輸出文件的部分輸出到與初始文件具有相同名稱但末尾帶有「_top_hits.txt」的新文件中。

for sequence in ./sequences/{.,}*; 
    do 
      echo "$sequence"; 
      blastp -query $sequence -db database.faa -out ${sequence}.txt -evalue 1e-10 -outfmt 7 
      cat ${sequence}.txt | awk '/hits found/{getline;print}' | grep -v "#">${sequence}_top_hits.txt 
    done

當我運行這段代碼，它給了我從每個文件來源的六個新目錄中的文件（他們都在同一個目錄 - 我寧願讓他們都在自己的文件夾如何。我可以那樣做嗎？）。他們都是空的。它們的後綴是「.txt」，「.txt.txt」，「.txt_top_hits.txt」，「_top_hits.txt」，「_top_hits.txt.txt」和「_top_hits.txt_top_hits.txt」。

如果我可以提供任何進一步的信息來澄清任何事情，請讓我知道。

來源

2016-11-23 lynkyra

你看起來至少有一個問題，就是你試圖在同一個目錄下多次運行同一個函數。每次運行它時，我都相信你的循環會找到你在前一次運行中生成的新文件，並試圖對它們進行操作。據我所知，你沒有限制你的文件搜索以'* .fa'結尾的文件，但我建議你這樣做。否則，您將繼續處理新輸出的.txt文件並生成更多錯誤的輸出。 – aardvarkk

我同意，我確實需要這樣做。我想另一種解決方法是將所有輸出文件輸出到一個單獨的目錄。我將如何使它只遍歷以* .fa結尾的文件？我把它放在第一行嗎？ – lynkyra

如果你只在*.fa文件感興趣我你的輸入限制只有這樣那些符合條件的文件：

for sequence in sequences/*.fa; do

來源

2016-11-23 04:13:13 aardvarkk

我可以建議你如下改進：

for fasta_file in ./sequences/*.fa # ";" is not necessary if you already have a new line for your "do" 
do 
    # ${variable%something} is the part of $variable 
    # before the string "something" 
    # basename path/to/file is the name of the file 
    # without the full path 
    # $(some command) allows you to use the result of the command as a string 
    # Combining the above, we can form a string based on our fasta file 
    # This string can be useful to name stuff in a clean manner later 
    sequence_name=$(basename ${fasta_file%.fa}) 
    echo ${sequence_name} 
    # Create a directory for the results for this sequence 
    # -p option avoids a failure in case the directory already exists 
    mkdir -p ${sequence_name} 
    # Define the name of the file for the results 
    # (including our previously created directory in its path) 
    blast_results=${sequence_name}/${sequence_name}_blast.txt 
    blastp -query ${fasta_file} -db database.faa \ 
     -out ${blast_results} \ 
     -evalue 1e-10 -outfmt 7 
    # Define a file name for the top hits 
    top_hits=${sequence_name}/${sequence_name}_top_hits.txt 
    # alternatively, using "%" 
    #top_hits=${blast_results%_blast.txt}_top_hits.txt 
    # No need to cat: awk can take a file as argument 
    awk '/hits found/{getline;print}' ${blast_results} \ 
     | grep -v "#" > ${sequence_name}_top_hits.txt 
done

我做了更多的中間變量，（有希望）有意義的名字。我用\來逃避行結束，並允許把命令放在幾行。我希望這可以提高代碼的可讀性。

我還沒有測試。可能有錯別字。

來源

2016-11-23 16:29:45 bli

遍歷目錄中的文件，創建輸出文件linux

回答

相關問題