如何在Bash中選擇最多從<string> _＃文件名系列腳本

我有一個文件如何在Bash中選擇最多從<string> _＃文件名系列腳本

heat1.conf 
heat2.conf 
... 
heat<n>.conf 
minimize.conf 
... 
other files....

我想我bash腳本要能夠抓住次數最多的文件名（目錄所以我可以刪除並在發現錯誤情況時將其替換）。

完成此操作的最佳方法是什麼？

請討論解決方案的速度以及爲什麼您認爲這是最佳方案。

來源

2010-08-26 Jason R. Mick

「請討論速度......」聽起來像是作業。 – msw 2010-08-26 21:00:21

哈！沒有。我只是好奇，因爲我將作爲一個研究項目的一部分24-7運行這個腳本，所以我希望它體面快。我甚至沒有在CS - 我是化學工程博士候選人，但我的背景是在CE。該代碼片段是一個更大的bash腳本的一部分，用於自動提交一組多步化學分子動力學模擬，並恢復/診斷從錯誤中崩潰的模擬。 – 2010-08-27 18:40:50

PS。除非他們同意你的問題，否則請不要編輯人們被標記爲「家庭作業」的問題。我的文章是不是家庭作業，它有點讓我離開你編輯我的問題說。 – 2010-08-27 18:51:45

如果您打算僅在當前目錄中列出您的文件，則無需使用maxdepth 1查找或使用ls。只需在shell擴展中使用for循環。另外，expr是外部的。如果你的數字不包含小數，你可以使用bash自己的比較。

max=-1 
for file in heat*.conf 
do 
    num=${file:4} 
    num=${file%.conf} 
    [[ $num -gt $max ]] && max=$num  
done 
echo "max is: $max"

來源

2010-08-27 00:10:07 ghostdog74

什麼：

max=$(find . -name 'heat[1-9]*.conf' -depth 1 | 
     sed 's/heat\([0-9][0-9]*\)\.conf/\1/' | 
     sort -n | 
     tail -n 1)

列出可能的文件名;只保留非數字位;對數字進行排序;選擇最大（最後）號碼。

關於速度：不會陷入一種腳本語言如Perl（Python和Ruby，...），這是接近一樣好，你可以得到的。使用find而不是ls意味着文件名列表只生成一次（此答案的第一個版本使用ls，但這會導致shell生成文件名列表，然後ls回顯該列表）。 sed命令非常簡單，並生成一個必須排序的數字列表。你可以爭辯說，一個倒序數字排序（sort -nr）排序到sed 1q會更快;第二個sed會讀取更少的數據，並且排序可能不會在關閉其輸入（因爲它終止）的sed的SIGPIPE之前生成其所有輸出。

在像Perl這樣的腳本語言中，您將避免多個進程以及這些進程之間管道通信的開銷。這會更快，但是會涉及更少的shell腳本。

來源

2010-08-26 21:07:29

，我想出了一個解決辦法：

highest=-1 
current_dir=`pwd` 
cd $my_dir 
for file in $(ls heat*) ; do #assume I've already checked for dir existence 
    if [ "${file:4:$(($(expr length $file)-9))}" -gt "$highest" ]; then 
    highest=${file:4:$(($(expr length $file)-9))} 
    fi 
done 
cd $current_dir

....好吧，我把你的建議和編輯我的解決方案報廢EXPR和預先保存的變量。在1000次試驗中，我的方法（修改後的）平均速度比Jon慢但比GhostDog慢，但標準偏差相對較大。

我修改後的腳本如下出現在我的審判，因爲是喬恩和鬼狗的解決方案......

declare -a timing 

for trial in {1..1000}; do 
    res1=$(date +%s.%N) 
    highest=-1 
    current_dir=`pwd` 

    cd $my_dir 
    for file in $(ls heat*) ; do 
     #assume I've already checked for dir existence 
    file_no=${file:4:${#file}-9} 
    if [ $file_no -gt $highest ]; then 
     highest=$file_no 
    fi 
    done 
    res2=$(date +%s.%N) 
    timing[$trial]=$(echo "scale=9; $res2 - $res1"|bc) 
    cd $current_dir 
done 

average=0 
#compile net result 
for trial in {1..1000}; do 
    current_entry=${timing[$trial]} 
    average=$(echo "scale=9; (($average+$current_entry/1000.0))"|bc) 
done 

std_dev=0 
for trial in {1..1000}; do 
    current_entry=${timing[$trial]} 
    std_dev=$(echo "scale=9; (($std_dev + ($current_entry-$average)*($current_entry-$average)))"|bc) 
done 
std_dev=$(echo "scale=9; sqrt (($std_dev/1000))"|bc) 
printf "Approach 1 (Jason), AVG Elapsed Time: %.9F\n" $average 
printf "STD Deviation:     %.9F\n" $std_dev 


for trial in {1..1000}; do 
    res1=$(date +%s.%N) 
    highest=-1 
    current_dir=`pwd` 

    cd $my_dir 
    max=$(ls heat[1-9]*.conf | 
    sed 's/heat\([0-9][0-9]*\)\.conf/\1/' | 
    sort -n | 
    tail -n 1) 
    res2=$(date +%s.%N) 
    timing[$trial]=$(echo "scale=9; $res2 - $res1"|bc) 
    cd $current_dir 
done 

average=0 
#compile net result 
for trial in {1..1000}; do 
    current_entry=${timing[$trial]} 
    average=$(echo "scale=9; (($average+$current_entry/1000.0))"|bc) 
done 

std_dev=0 
for trial in {1..1000}; do 
    current_entry=${timing[$trial]} 
    #echo "(($std_dev + ($current_entry-$average)*($current_entry-$average))" 
    std_dev=$(echo "scale=9; (($std_dev + ($current_entry-$average)*($current_entry-$average)))"|bc) 
done 
std_dev=$(echo "scale=9; sqrt (($std_dev/1000))"|bc) 
printf "Approach 2 (Jon), AVG Elapsed Time: %.9F\n" $average 
printf "STD Deviation:     %.9F\n" $std_dev 


for trial in {1..1000}; do 
    res1=$(date +%s.%N) 
    highest=-1 
    current_dir=`pwd` 

    cd $my_dir 
    for file in heat*.conf 
     do 
     num=${file:4} 
     num=${file%.conf} 
     [[ $num -gt $max ]] && max=$num  
    done 
    res2=$(date +%s.%N) 
    timing[$trial]=$(echo "scale=9; $res2 - $res1"|bc) 
    cd $current_dir 
done 

average=0 
#compile net result 
for trial in {1..1000}; do 
    current_entry=${timing[$trial]} 
    average=$(echo "scale=9; (($average+$current_entry/1000.0))"|bc) 
done 

std_dev=0 
for trial in {1..1000}; do 
    current_entry=${timing[$trial]} 
    #echo "(($std_dev + ($current_entry-$average)*($current_entry-$average))" 
    std_dev=$(echo "scale=9; (($std_dev + ($current_entry-$average)*($current_entry-$average)))"|bc) 
done 
std_dev=$(echo "scale=9; sqrt (($std_dev/1000))"|bc) 
printf "Approach 3 (GhostDog), AVG Elapsed Time: %.9F\n" $average 
printf "STD Deviation:     %.9F\n" $std_dev

...結果是：

Approach 1 (Jason), AVG Elapsed Time: 0.041418086 
STD Deviation:     0.177111854 
Approach 2 (Jon), AVG Elapsed Time: 0.061025972 
STD Deviation:     0.212572411 
Approach 3 (GhostDog), AVG Elapsed Time: 0.026292145 
STD Deviation:     0.145542801

幹得好GhostDog！！同時感謝Jon和評論員的提示！ :)

來源

2010-08-26 21:10:42

一個明顯的改進是每個新號碼運行一次'expr'，而不是兩次。 'expr'不是一個快速的程序。而且，在其自己的文件中的shell腳本中，'pwd'和最終的'cd'命令是不相關的;這不是您正在處理的DOS .bat文件。即使作爲一個更大的腳本的一個片段，我可能會使用一個子shell，允許它更改爲目標目錄，而將調用shell始終保留在原來的位置。你可以避免使用'ls'來簡單地回顯shell在擴展通配符時生成的文件列表;你可以使通配符更準確。 – 2010-08-26 21:21:17

由於'sh'沒有'$ {var：start：count}'子串選擇，你的腳本顯然是在Bash中。因此不需要使用'expr'。此外，在substring運算符中默認啓用算術。數字比較應在'（（））'內完成。 'if（（$ {file：4：$ {＃file} -9}>最高））'' – 2010-08-27 05:14:07

如何在Bash中選擇最多從<string> _＃文件名系列腳本

回答

相關問題