2016-03-04 81 views
2

我有一個特定的集合(全部以.bam結尾)下載的文件在目錄/home/cmccabe/Desktop/NGS/API/2-15-2016中。我正在嘗試使用匹配 name來重命名下載的文件。爲了讓事情更加複雜,文件夾的日期是唯一的,並且在name的標題中存在比賽日期,並且是name中的比賽所在的位置。我不知道如何做到這一點,或者如果可能的話。謝謝 :)。文件夾的將目錄中的特定文件擴展名重新命名爲與bash中的另一個文件匹配

內容/home/cmccabe/Desktop/NGS/API/2-15-2016

IonXpress_001.bam 
IonXpress_002.bam 
IonXpress_003.bam 
IonXpress_007.bam 
file1.gz 
file2.gz 

名稱

2-15-2016 
IonXpress_001.bam testname1_12345 
IonXpress_002.bam testname2_45678 
IonXpress_003.bam testname3_9012 
IonXpress_007.bam testname1_12345- 
2-19-2016 
IonXpress_001.bam testname5_00000 
IonXpress_002.bam testname6_11111 
IonXpress_003.bam testname7_1213 
IonXpress_007.bam testname8_78524 

期望結果

testname1_12345.bam 
testname2_45678.bam 
testname3_9012.bam 
testname1_12345.bam 
file1.gz 
file2.gz 

慶典到目前爲止

logfile=/home/cmccabe/Desktop/NGS/API/2-15-2016/process.log 
for f in /home/cmccabe/Desktop/NGS/API/2-15-2016/*.bam ; do 
echo "patient identifier creation: $(date) - File: $f" 
bname=$(basename $f) 
pref=${bname%%.bam} 
while read from to ; do 
for i in $f* ; do 
if [ "$i" != "${i/$from/$to}" ] ; then 
    mv $i ${i/$from/$to} 
fi 
done < names.txt 
echo "End patient identifier creation: $(date) - File: $f" 
done >> "$logfile" 

編輯:

for f in /home/cmccabe/Desktop/NGS/API/2-12-2016/*.bam ; do 
    bname=$(basename $f) 
    cmd=$(sed -n "/$f/,/[0-9]{1,2}-[0-9]{1,2}-20[0-9]{2}/{s/\(.*\.bam\) \(.*\)/mv \1 \2/p}" /home/cmccabe/Desktop/NGS/panels/names.txt) 
    echo "$cmd" 
done 
sed: -e expression #1, char 4: extra characters after command 
+0

在該文件夾的結束日期('/家庭/ cmccabe /桌面/ NGS/API/2-15-2016')將匹配'name'中的頭部,並且這是匹配所在的位置。非常感謝你 :)。 – Chris

+0

對於那些錯別字,我很抱歉,文件中應該有一個總是匹配的地方。如果沒有匹配,那麼它應該是錯誤的。謝謝 :)。 – Chris

回答

2

您可以使用此for循環與awk

cd /home/cmccabe/Desktop/NGS/ 

for file in API/*/*.bam; do 
    f="${file##*/}" 
    path="${file%/*}" 
    dt="${path##*/}" 
    mv "$file" "$path/$(awk -v dt="$dt" -v f="$f" 'NF==1 { 
       p=$0==dt ? 1 : 0; next} p && $1==f{print $2}' names.txt)" 
done 
+0

帶'awk'的'for'循環似乎可以工作但是我爲每個匹配文件獲取'mv:'API/2-12-2016/IonXpress_001_newheader.bam'和'API/2-12-2016/IonXpress_001_newheader.bam'是相同的文件,而不是重命名它們。非常感謝你 :)。 – Chris

+0

我已經測試過這個命令,它在我的系統上工作正常。確保你的names.txt沒有DOS行結尾。如果它有DOS行結束,那麼首先在該文件上運行'dos2unix'。 – anubhava

+0

獨立測試'awk'使用:'cd/home/cmccabe/Desktop/NGS /; awk -v dt ='2-15-2016'-v f = IonXpress_001_newheader.bam''NF == 1 {p = $ 0 == dt? 1:0; next} p && $ 1 == f {print $ 2}'names.txt'並查看它提供的輸出。 – anubhava

1

你可以做這樣的事情記下我使用˚F變量sed的:

cmd=$(sed -n "/$f/,/[0-9]{1,2}-[0-9]{1,2}-20[0-9]{2}/{s/\(.*\.bam\) \(.*\)/mv \1 \2/p}" names.txt) 
# for testing use echo and this will also save what you just tried 
#to do to your log file :) just in case. 
echo "$cmd" 
# when it works the way you want 
# uncomment the next line and it will execute your command :) 
#eval "$cmd" 

這樣做是什麼告訴sed不要打印它讀取的行-n

然後是從符合日期($ f)的行到DD-DD-20DD的下一個數據模式(正則表達式:[0-9] {1,2} - [0-9] {1,2 {}} {0} {}執行命令{}

{}內的命令是替代的「s」命令,它將匹配一個模式並將其替換爲另一個模式。

我告訴它採取串一路.bam,使一組,將其置於\(和\)之間的匹配,那麼該行的其餘部分,並把在另一組

的替換模式是在匹配模式中捕獲的組1後跟隨的mv字符串,然後是組2中的字符串,有效地創建mv file.bam new_filename命令的列表。

這然後存儲起來在一個cmd變量

EVAL將執行命令..

我把你name.txt文件的示例內容,也做了改造說明:

~$echo "2-12-2016 
IonXpress_001.bam testname1_12345 
IonXpress_002.bam testname2_45678 
IonXpress_003.bam testname3_9012 
IonXpress_007.bam testname1_12345- 
2-19-2016 
IonXpress_001.bam testname5_00000 
IonXpress_002.bam testname6_11111 
IonXpress_003.bam testname7_1213 
IonXpress_007.bam testname8_78524" |sed -n "/$f/,/[0-9]{1,2}-[0-9]{1,2}-20[0-9]{2}/{s/\(.*\.bam\) \(.*\)/mv \1 \2/p}" 
mv IonXpress_001.bam testname1_12345 
mv IonXpress_002.bam testname2_45678 
mv IonXpress_003.bam testname3_9012 
mv IonXpress_007.bam testname1_12345- 
mv IonXpress_001.bam testname5_00000 
mv IonXpress_002.bam testname6_11111 
mv IonXpress_003.bam testname7_1213 
mv IonXpress_007.bam testname8_78524 

更新:從您的評論和編輯,我看到我不是很擅長解釋:)我在這裏是你的腳本的編輯版本。 我假設你在運行這個時在/ home/cmccabe/Desktop/NGS/API /文件夾中。如果不是,我相信你會知道如何進行修改或使其發生爭論。

logfile=/home/cmccabe/Desktop/NGS/API/2-15-2016/process.log 
# no need to loop for each file ending in bam as the name file 
# will be our driver. After all if the entry is not present in 
# the name file then we really cannot do anything. 

# First lets get the date from the folder name: 
# pwd will return the current working directory (which we are supposed 
#  to be in the directory to process) 
# basename will strip all but the last folder name, hence the date 
date_to_process=$(basename $(pwd)) 

# variable to store name file path (hint change this to where it really is or pass as argument to script) 
name_file_path = "/home/cmccabe/Desktop/NGS/panels/names.txt" 

# from the name file build the file move (mv) commmands using sed 
# as described before and store that command in the cmd variable. 
# note that I added a couple of echo commands to have the same output you 
# were trying to do. I also split the command on multiple lines 
# for clarity (well I hope it makes it more clear at least). 
cmd=$(sed -n "/$date_to_process/,/[0-9]{1,2}-[0-9]{1,2}-20[0-9]{2}/{ 
    s/\(.*\.bam\) \(.*\)/echo \"Start patient identifier creation: \$(date) - File: \1\"\n mv \1 \2\n echo \"End patient identifier creation: \$(date) - File: \1\"/p 
}" $name_file_path) 

# print the generated commands to you can see what it did. 
echo "about to execute this command: 
$cmd" 

# execute the commands to perform the move operations and send the 
#output to the log file. Make sure to pipe stderr (errors) to the log file 
# too so you will know what/if something failed. (using 2>&1) this will make all stderr go to the same pipe as stdin. 
eval "$cmd" >> "$logfile" 2>&1 
+0

我在帖子中添加了一個編輯,我沒有做正確的事情....非常感謝你:)。 – Chris

+1

@Chris我用一個簡單的腳本更新了我的帖子,該腳本應該能夠滿足您的需求。我建議評價eval線,並確保腳本先做你想做的事(它會打印它生成的命令,以便確保它是你想要的) – Rob

+0

@ Rob謝謝你的詳細解釋:) – Chris

相關問題