2017-02-19 65 views
0

我想按降序排列此文件的絕對值的線性迴歸(p)列。我試圖做到這一點沒有很好的工作。我不知道它失敗了。我發現這個代碼從http://www.unix.com/shell-programming-and-scripting/168144-sort-absolute-value.htmlunix中的文件按字段的絕對值排序

awk -F',' '{print ($2>=0)?$2:-$2, $0}' OFS=',' mycsv1.csv | sort -n -k8,8 | cut -d ',' -f2- 



X var,Y var,MIC (strength),MIC-p^2 (nonlinearity),MAS (non-monotonicity),MEV (functionality),MCN (complexity),Linear regression (p) 
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474 
AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215 
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648 
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262 
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298 
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431 
... 

請幫我理解awk腳本來對這個文件進行排序。

+0

什麼是「沒有相當的工作「? – pvg

+0

。它沒有按列8或任何其他列排序。所以我不知道它爲什麼失敗 – ChathuraG

+0

仔細看看代碼應該在這裏幫助。我的意思是當你對第8場感興趣時,你爲什麼要在'awk'中尋找'$ 2'字段? – hek2mgl

回答

1

您可以使用此sedsort,並按照@ hek2mgl的末添加和刪除字段保留原號碼的非常聰明的邏輯:

sed -E 's/,([-]?)([0-9.]+)$/,\1\2,\2/' file | sort -t, -k9,9 -nr | cut -f1-8 -d, 
  • sed -E 's/,([-]?)([0-9.]+)$/,\1\2,\2/' =>創建場9如由新創建的字段8
  • sort -t, -k9,9 -nr =>各種字段的絕對值,數字和降序
  • cut -f1-8 -d, =>去除第九字段,恢復輸出到其原始格式,將具有期望的排序順序

這裏是輸出:

AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215 
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262 
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431 
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474 
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298 
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648 
+1

感謝它的工作 – ChathuraG

+0

如果輸出默認分隔符進行排序並剪切(\ t)而不是從第一個命令開始,那麼您不需要在後面的命令中指定它們。 –

0

取三個步驟:

(1)暫時創建其中包含一個第九字段

LC_COLLATE=C awk -F, 'NR>1{v=$NF;sub(/-/,"",v);printf "%s%s%s%s",$0,FS,v,RS}' file 
     ^------ make sure this is set since sorting, especially the decimal point 
      depends on the local. 

(2)分類基於所述第九字段輸出:

場8的絕對值
command_1 | sort -t, -k9r 

(3)管道返回到awk刪除最後一個字段。 NF--減少將有效地刪除最後一個字段的字段數量。 1總是正確的,這使得awk打印行:

command_2 | cut -d, -f1-8 

輸出:

AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215 
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262 
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431 
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474 
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298 
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648 
+1

也應該設置'LC_COLLATE'而不是'LANG'來避免其他可能的不需要的改變語言環境的副作用。 –

+2

由於列數已知,因此'cut'可能是'awk'更好的選擇。改變了這一點,現在使用'LC_COLLATE'。 – hek2mgl

0

能拿awk來做到這一切:

awk -F, 'NR>1{n[substr($NF,1,1)=="-"?substr($NF,2):$NF]=$0}NR==1;END{asorti(n,out);for(i in out)print n[out[i]]}' file