awk幾何平均值在同一行值

我有下面的輸入，如果「Cpd_number」和「ID3」是相同的，我想做幾何平均值。這些文件有很多數據，所以我們可能需要數組來完成這些技巧。然而，作爲awk初學者，我不太確定如何開始。任何人都可以提供一些提示嗎？awk幾何平均值在同一行值

輸入：

「95」的

「ID1」,「Cpd_number」, 「ID2」,」ID3」,」activity」 
「95」,「123」,」4」,」5」,」10」 
「95」, 「123」,」4」,」5」,」100」 
「95」, 「123」,」4」,」5」,」1」 
「95」, 「123」,」4」,」6」,」10」 
「95」, 「123」,」4」,」6」,」100」 
「95」, 「456」,」4」,」6」,」10」 
「95」, 「456」,」4」,」6」,」100」

三行，「123」，」 4」，」 5」應該做的‘95’的幾何平均

兩行，‘123’ ，」 4」，」 6」應該做‘95’的幾何平均

兩行，‘456’，」 4」，」 6」應該做一個幾何平均

這裏是所期望的輸出：

「ID1」,「Cpd_number」, 「ID2」,」ID3」,」activity」 
「95」,「123」,」4」,」5」,」10」 
「95」, 「123」,」4」,」6」,」31.62」 
「95」, 「456」,」4」,」6」,」31.62」

約幾何平均的一些信息：

http://en.wikipedia.org/wiki/Geometric_mean

這個腳本計算幾何平均值

#!/usr/bin/awk -f 
{ 
    b = $1; # value of 1st column 
    C += log(b); 
    D++; 
} 

END { 
    print "Geometric mean : ",exp(C/D); 
    }

來源

2014-06-26 Chubaka

答覆已更新。 – klashxx

有了這個文件：

$ cat infile 
"ID1","Cpd_number","ID2","ID3","activity" 
"95","123","4","5","10" 
"95","123","4","5","100" 
"95","123","4","5","1" 
"95","123","4","6","10" 
"95","123","4","6","100" 
"95","456","4","6","10" 
"95","456","4","6","100"

這件作品：

awk -F\" 'BEGIN{print}   # Print headers 
     last != $4""$8 && last{  # ONLY When last key "Cpd_number + ID3" 
      print line,exp(C/D)  # differs from actual , print line + average 
      C=D=0}     # reset acumulators 
     { # This block process each line of infile 
     C += log($(NF-1)+0)  # C calc 
     D++      # D counter 
     $(NF-1)=""     # Get rid of activity col ir order to print line 
     line=$0     # Line will be actual line without activity 
     last=$4""$8}    # Store the key in orther to track switching 
     END{ # This block triggers after the complete file read 
      # to print the last average that cannot be trigger during 
      # the previous block 
      print line,exp(C/D)}' infile

會拋出：

ID1 , Cpd_number , ID2 , ID3 , 0 
95 , 123 , 4 , 5 , 10 
95 , 123 , 4 , 6 , 31.6228 
95 , 456 , 4 , 6 , 31.6228

還有些工作要格式化。

NOTE: char " is used instead of 「 and 」

編輯：NF是在文件中的字段數量，因此NF-1將是倒數：

$ awk -F\" 'BEGIN{getline}{print $(NF-1)}' infile                     
10 
100 
1 
10 
100 
10 
100

在

所以：日誌（$（NF-κB 1）+0）我們將log函數應用於該值（加上0總和以確保數值）

D ++只是一個計數器。

來源

2014-06-26 09:52:19 klashxx

另外，我可以知道這裏「$ 8」是什麼意思嗎？ – Chubaka

我不得不承認，我喜歡awk解決方案。 –

你好klashxx，你可以稍微解釋一下腳本嗎？我很難完全理解這可能是因爲我的初學者水平。而且，每次腳本執行時，都會有一個「錯誤」消息：「awk：cmd。line：4：（FILENAME = infile FNR = 10）致命：試圖訪問field -1」。我可以知道這是什麼意思嗎？ – Chubaka

爲什麼要使用awk，只需在bash中執行，使用bc或calc來處理浮點數學運算。您可以在http://www.isthe.com/chongo/src/calc/下載calc（2.12.4.13-11最新版本）。有rpms，二進制和源代碼壓縮包可用。在我看來，它遠遠優於bc。例程非常簡單。 您需要先從您的數據文件中刪除多餘的"引號，然後先離開一個csv文件。這有幫助。請參閱下面註釋中使用的sed命令。請注意，下面的幾何平均值是（id1 * cpd * id2 * id3）的第4個根。如果你需要一個不同的意思，只是調整下面的代碼：

#!/bin/bash 

## 
## You must strip all quotes from data before processing, or write more code to do 
## it here. Just do "$ sed -d 's/\"//g' <datafile> newdatafile" Then use 
## newdatafile as command line argument to this program 
## 
## Additionally, this script uses 'calc' for floating point math. go download it 
## from: http://www.isthe.com/chongo/src/calc/ (2.12.4.13-11 is latest). You can also 
## use bc if you like, but why, calc is so much better. 
## 

## test to make sure file passed as argument is readable 
test -r "$1" || { echo "error: invalid input, usage: ${0//*\//} filename"; exit 1; } 

## function to strip extraneous whitespace from input 
trimWS() { 
    [[ -z $1 ]] && return 1 
    strln="${#1}" 
    [[ strln -lt 2 ]] && return 1 
    trimSTR=$1 
    trimSTR="${trimSTR#"${trimSTR%%[![:space:]]*}"}" # remove leading whitespace characters 
    trimSTR="${trimSTR%"${trimSTR##*[![:space:]]}"}" # remove trailing whitespace characters 
    echo $trimSTR 
    return 0 
} 

let cnt=0 
let oldsum=0 # holds value to compare against new Cpd_number & ID3 
product=1  # initialize product to 1 
pcnt=0   # initialize the number of values in product 
IFS=$',\n'  # Internal Field Separator, set to break on ',' or newline 

while read newid1 newcpd newid2 newid3 newact || test -n "$act"; do 

    cpd=`trimWS $cpd` # trimWS from cpd (only one that needed it) 

    # if first iteration, just output first row 
    test "$cnt" -eq 0 && echo " $newid1 $newcpd $newid2 $newid3 $newact" 

    # after first iteration, test oldsum -ne sum, if so do geometric mean 
    # and reset product and counters 
    if test "$cnt" -gt 0 ; then 

     sum=$((newcpd+newid3)) # calculate sum to test against oldsum 
     if test "$oldsum" -ne "$sum" && test "$cnt" -gt 1; then 
      # geometric mean (nth root of product) 
      # mean=`calc -p "root ($product, $pcnt)"` # using calc 
      mean=`echo "scale=6; e(l($product)/$pcnt)" | bc -l` # using bc 
      echo " $id1 $cpd $id2 $id3 average: $mean" 
      pcnt=0 
      product=1 
     fi 

     # update last values to new values 
     oldsum=$sum 
     id1="$newid1" 
     cpd="$newcpd" 
     id2="$newid2" 
     id3="$newid3" 
     act="$newact" 

     ((product*=act)) # accumulate product 
     ((pcnt+=1)) 
    fi 

    ((cnt+=1)) 

done < "$1"

輸出：

# output using calc 
ID1 Cpd_number ID2 ID3 activity 
95 123 4 5 average: 10 
95 123 4 6 average: 31.62277660168379331999 
95 456 4 6 average: 31.62277660168379331999 

# output using bc 
ID1 Cpd_number ID2 ID3 activity 
95 123 4 5 average: 9.999999 
95 123 4 6 average: 31.622756 
95 456 4 6 average: 31.622756

更新腳本計算適當的平均值。由於必須保留舊/新值來測試id3中的變化，所以涉及更多一點。這可能是awk更簡單的方法。但是如果你以後需要更多的靈活性，bash可能就是答案。

來源

2014-06-26 18:11:07

感謝David！我可以知道」bc「代表什麼嗎？ – Chubaka

bc是一種隨意安裝的精確計算器語言用bash。亨利 - 我誤解了你的意思，上面的代碼讀取數據，但是計算id和cpd值的平均值，而不是cpd和id3相等的活動值，我明白你想要什麼 –

@HenrySu我要回答這個問題，等到我有機會解決平均計算問題，如果你願意，我可以離開它，讓我知道，或者在我把它拉下來之前複製代碼。等幾分鐘。 –

awk幾何平均值在同一行值

回答

相關問題