2017-10-06 41 views
-2

我有一個titanic.txt數據集。它是在形式 - PassengerId,活了下來,Pclass,姓名,性別,年齡,SibSp,烘乾,票務,票價,機艙,踏上 1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S 2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C 3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S 4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S如何在unix中計算逐行比例

如果存活列是1,那麼乘客倖存。登船就是乘客搭乘的港口。

我想計算登船港口中倖存者佔總乘客的比例。 這怎麼可以使用awk命令完成?

預期輸出 - 1 C 0.553571 Q 0.38961 S 0.336957

+0

您能在這裏添加預期的輸出嗎?那麼我們就更容易引導。 – RavinderSingh13

+0

@ RavinderSingh13我已添加預期的輸出 –

+0

@KarthikK,您的輸出不符合您的條件。更新你的輸出結果或者詳細說明你的條件 – RomanPerekhrest

回答

0

這樣的事情,沒有測試

awk -F, 'NR>1 {sum[$NF]+=$2} 
     END {for(k in sum) print k,sum[k]/(NR-1)}' file 

然而,由於分母是總的乘客,計數本身可能更有意義。也許你想擁有每個港口的生存率?如果是這樣,請添加count[$NF]++並將其除以END塊中的值。

+1

你得到了錯字,使用了數組'總和',訪問了,數組'count' –

+0

對,固定... – karakfa

0

也許這將有助於在預期輸出,其中從你得到Q 0.38961,你應該解釋清楚你需要什麼,這樣你會得到提前反應不知道,否則會引起混亂:

$ cat f 
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S 
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C 
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S 
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S 

# denominator- total passengers of all ports with percentage 
# example : overall there were 3 passengers survived across all port, 
# in that port wise 
$ awk -F, '{sum[$NF]+=$2; total+=$2}END{for(k in sum)print k,sum[k]/total, (sum[k]/total)*100 }' f 
C 0.333333 33.3333 
S 0.666667 66.6667 

# denominator- total records of each port, with percentage 
# example : for port S, there were 3 passengers, 2 survived, so 66.66% 
awk -F, '{sum[$NF]+=$2; oc[$NF]++}END{for(k in sum)print k,sum[k]/oc[k],(sum[k]/oc[k])*100 }' f 
C 1 100 
S 0.666667 66.6667 

# denominator- total records in file, which karakfa suggested 
$ awk -F, '{sum[$NF]+=$2}END{for(k in sum)print k,sum[k]/NR }' f 
C 0.25 
S 0.5 
0

這計劃,每次登船時,計算這艘登船人員的生活情況。

awk '{sum[NF]+=$2; tot[NF]++} END {for (emb in sum) print(emb, sum[emb]/tot[emb])}' file 
0
$ awk -F, '$2==1{a[$NF]++} END{for(i in a){print i,a[i]/NR}}' file 

$NF對應於最後字段即CS
a[$NF]創建帶有按鍵的關聯數組作爲$NF和每當$2==1即第二個字段Survived是1

遞增1的值

輸出:

C 0.25 
S 0.5