傳遞循環使用非整數awk

我想寫代碼將實現：其中$ 7小於$我（0 - 1的增量爲0.05），打印行並傳遞到字數。我試圖做到這一點的方法是：傳遞循環使用非整數awk

for i in $(seq 0 0.05 1); do awk '{if ($7 <= $i) print $0}' file.txt | wc -l ; done

這只是最終返回完整的文件（〜40萬條用戶線）爲$我每個實例的字數。例如，當使用$ 7 < = 0.00時，它應該返回〜67K。

我覺得可能有辦法在awk中做到這一點，但我還沒有看到任何允許非整數的建議。

在此先感謝。

來源

2017-08-30 Lynsey Hall

[谷歌「awk不是shell」]（https://www.google.com/search?q=%22awk+is+not+shell%22） –

您需要將$ i作爲變量傳遞給awk，其中-v –

我現在已經這麼做了，謝謝。在發佈之前，我不知道這就是問題出現的原因，否則我的Google漏洞可能會更加豐碩！但我現在知道了，應該將這個詞傳播給遇到這個問題的其他人:) –

通行證$ i到的awk與-v等的變量：

for i in $(seq 0 0.05 1); do awk -v i=$i '{if ($7 <= i) print $0}' file.txt | wc -l ; done

來源

2017-08-30 14:57:49

謝謝你的幫助，我不知道-v。我的谷歌搜索引導了我awk'BEGIN {while（getline <「'」$ INPUTFILE「'」）{路徑，這似乎越來越精細！我很高興有一個吝嗇的方式實現相同的結果！ –

根據文件中數據的結構，您也可以使用完整的awk解決方案，並且不需要使用bash for循環 –

感謝您的建議。我會檢查這個未來。 –

一些由數據：

$ cat file.txt 
1 2 3 4 5 6 7 a b c d e f 
1 2 3 4 5 6 0.6 a b c 
1 2 3 4 5 6 0.57 a b c d e f g h i j 
1 2 3 4 5 6 1 a b c d e f g 
1 2 3 4 5 6 0.21 a b 
1 2 3 4 5 6 0.02 x y z 
1 2 3 4 5 6 0.00 x y z l j k

一種可能的100％awk溶液：

awk ' 
BEGIN { line_count=0 } 

{ printf "================= %s\n",$0 

    for (i=0; i<=20; i++) 
    { if ($7 <= i/20) 
     { printf "matching seq : %1.2f\n",i/20 
      line_count++ 
      seq_count[i]++ 
      next 
      } 
    } 
} 

END { printf "=================\n\n" 

     for (i=0; i<=20; i++) 
     { if (seq_count[i] > 0) 
      { printf "seq = %1.2f : %8s (count)\n",i/20,seq_count[i] } 
     } 

     printf "\nseq = all : %8s (count)\n",line_count 
    } 
' file.txt 


# the output: 
================= 1 2 3 4 5 6 7 a b c d e f 
================= 1 2 3 4 5 6 0.6 a b c 
matching seq : 0.60 
================= 1 2 3 4 5 6 0.57 a b c d e f g h i j 
matching seq : 0.60 
================= 1 2 3 4 5 6 1 a b c d e f g 
matching seq : 1.00 
================= 1 2 3 4 5 6 0.21 a b 
matching seq : 0.25 
================= 1 2 3 4 5 6 0.02 x y z 
matching seq : 0.05 
================= 1 2 3 4 5 6 0.00 x y z l j k 
matching seq : 0.00 
================= 

seq = 0.00 :  1 (count) 
seq = 0.05 :  1 (count) 
seq = 0.25 :  1 (count) 
seq = 0.60 :  2 (count) 
seq = 1.00 :  1 (count) 

seq = all :  6 (count)

BEGIN { line_count=0 }：初始化總行c ounter
print聲明僅僅用於調試目的;因爲它的加工
for (i=0; i<=20; i++)：根據實施，awk一些版本可能在序列舍入/準確性問題與非整數（例如，由0.05增加），因此我們在我們的序列中使用完整整數，然後除以20（對於這種特殊情況），以便在後續測試中爲我們提供0.05增量。
$7 <= i/20：if field＃7小於或等於（i/20）。 ..
printf "matching seq ...：打印，我們對剛纔匹配序列值（i/20）
line_count++：加 '1'，我們總的行計數器
seq_count[i]++：加「1」，我們的序列計數器陣列
next：打破我們的序列循環的（因爲我們發現我們的匹配序列值（i/20），和處理該文件中的下一行
END ...：打印出我們的線計數
for (x=1; ...)/if/printf：循環通過我們的序列的陣列，打印行數對於每個序列（I/20）
printf "\nseq = all...：打印出我們的總線計數

注意：一些awk代碼可以進一步減少，但我會保留原樣，因爲如果您是awk的新手，它會更容易理解。 100％awk溶液

一（明顯？）的好處是，我們的序列/循環結構是內部awk從而使我們能夠自己限制於通過輸入文件（文件中的一個循環。文本）;當序列/循環構造在awk之外時，我們發現自己不得不爲每次通過序列/循環處理一次輸入文件（例如，對於該練習，我們將不得不處理輸入文件21次!!!）。

來源

2017-08-30 16:38:18 markp

使用一些猜測爲你真正想要完成什麼的，我想出了這個：

awk '{ for (i=20; 20*$7<=i && i>0; i--) bucket[i]++ } 
    END { for (i=1; i<=20; i++) print bucket[i] " lines where $7 <= " i/20 }'

與模擬數據從mark's second answer我得到這樣的輸出：

2 lines where $7 <= 0.05 
2 lines where $7 <= 0.1 
2 lines where $7 <= 0.15 
2 lines where $7 <= 0.2 
3 lines where $7 <= 0.25 
3 lines where $7 <= 0.3 
3 lines where $7 <= 0.35 
3 lines where $7 <= 0.4 
3 lines where $7 <= 0.45 
3 lines where $7 <= 0.5 
3 lines where $7 <= 0.55 
5 lines where $7 <= 0.6 
5 lines where $7 <= 0.65 
5 lines where $7 <= 0.7 
5 lines where $7 <= 0.75 
5 lines where $7 <= 0.8 
5 lines where $7 <= 0.85 
5 lines where $7 <= 0.9 
5 lines where $7 <= 0.95 
6 lines where $7 <= 1

來源

2017-08-31 04:54:45 tripleee

傳遞循環使用非整數awk

回答

相關問題