2017-08-31 53 views
2

我有根據它們的值(列3)從1到600排列的id(列2)的列表。我有另外一個同樣ID的列表,但是排名不同,因爲它們的差別是不同的。我怎樣才能把file2中id的第一個secound列表的排列順序與file1中的第一個id列表相混淆?例如:如何通過將它們與LINUX中的另一個數據文件進行比較來將相同的等級賦予一個id列表?

file1: 
    rank list-of-ids values 
    1 HOUSAM69708729 0.4468 
    2 HOCANM106363549 0.4434 
    3 HOCANM10845509 0.4268 
    4 HOCANM11098662 0.4203 
    5 HOUSAM68571374 0.3896 
    6 HOUSAM69990251 0.3895 
    7 HONLDM716072164 0.3893 
    8 HOUSAM69756113 0.3656 
    9 HOCANM11098658 0.3593 
    10 HOUSAM66626020 0.3538 

file2: 
list-of-ids values 
HOCANM106363549 0.4832 
HOUSAM69708729 0.4199 
HOCANM10845509 0.4143 
HOUSAM69990251 0.3887 
HOCANM11098662 0.3792 
HOUSAM69756113 0.365 
HOUSAM68571374 0.3649 
HONLDM716072164 0.3600 
HOUSAM66626020 0.3593 
HOCANM11098658 0.3545 

輸出文件應與排名從文件1來進行file2中:

output: 
rank list-of-ids values 
2 HOCANM106363549 0.4832 
1 HOUSAM69708729 0.4199 
3 HOCANM10845509 0.4143 
6 HOUSAM69990251 0.3887 
4 HOCANM11098662 0.3792 
8 HOUSAM69756113 0.365 
5 HOUSAM68571374 0.3649 
7 HONLDM716072164 0.3600 
10 HOUSAM66626020 0.3593 
9 HOCANM11098658 0.3545 

任何建議,好嗎?請注意,真實數據沒有任何標題,因此輸出不應該有標題。

+0

你是什麼意思「真正的數據沒有任何頭」,你可以請張貼你的實際da ta在這個例子中看起來像? –

回答

2

AWK溶液:

awk 'NR==FNR{ a[$2]=$1; next }{ print a[$1],$1,$2 }' file1 file2 
  • NR==FNR - 處理所述第一輸入文件(即file1

  • a[$2]=$1 - 捕獲rank值(第一場$1)到陣列a索引編號爲list-of-ids個值(第二場$2

  • next - 跳躍到下一個記錄(file1

  • print a[$1],$1,$2 - 從所述第二輸入文件file2打印字段($1, $2)與對應ranka[$1]


輸出:

2 HOCANM106363549 0.4832 
1 HOUSAM69708729 0.4199 
3 HOCANM10845509 0.4143 
6 HOUSAM69990251 0.3887 
4 HOCANM11098662 0.3792 
8 HOUSAM69756113 0.365 
5 HOUSAM68571374 0.3649 
7 HONLDM716072164 0.3600 
10 HOUSAM66626020 0.3593 
9 HOCANM11098658 0.3545 
+0

我的真實數據沒有任何列名。如何刪除「職級」作爲列名?我的意思是我不應該在輸出 – zara

+0

@zara的第一行(排名列表中的ids值),請參閱我的更新 – RomanPerekhrest

+0

謝謝。你能解釋一下你的劇本嗎?我想了解它 – zara

3

另一種選擇,使用'join'

$ join -1 2 -2 1 -o 1.1,2.1,2.2 <(sort -k 2 file1) <(sort -k 1 file2) 
2 HOCANM106363549 0.4832 
3 HOCANM10845509 0.4143 
9 HOCANM11098658 0.3545 
4 HOCANM11098662 0.3792 
7 HONLDM716072164 0.3600 
10 HOUSAM66626020 0.3593 
5 HOUSAM68571374 0.3649 
1 HOUSAM69708729 0.4199 
8 HOUSAM69756113 0.365                   
6 HOUSAM69990251 0.3887                   
ranks list-of-ids values 

誠然,這不處理的頭很乾淨。你已經接受的解決辦法,但我喜歡這個工具,而不是很多人都知道它;)


編輯:如果源數據沒有任何標題,則該命令的偉大工程:

$ cat file1 
    1 HOUSAM69708729 0.4468 
    2 HOCANM106363549 0.4434                 
    3 HOCANM10845509 0.4268                 
    4 HOCANM11098662 0.4203                 
    5 HOUSAM68571374 0.3896 
    6 HOUSAM69990251 0.3895 
    7 HONLDM716072164 0.3893 
    8 HOUSAM69756113 0.3656 
    9 HOCANM11098658 0.3593 
    10 HOUSAM66626020 0.3538 
$ cat file2 
HOCANM106363549 0.4832 
HOUSAM69708729 0.4199 
HOCANM10845509 0.4143 
HOUSAM69990251 0.3887 
HOCANM11098662 0.3792 
HOUSAM69756113 0.365 
HOUSAM68571374 0.3649 
HONLDM716072164 0.3600 
HOUSAM66626020 0.3593 
HOCANM11098658 0.3545 
$ join -1 2 -2 1 -o 1.1,2.1,2.2 <(sort -k 2 file1) <(sort -k 1 file2) 
2 HOCANM106363549 0.4832 
3 HOCANM10845509 0.4143 
9 HOCANM11098658 0.3545 
4 HOCANM11098662 0.3792 
7 HONLDM716072164 0.3600 
10 HOUSAM66626020 0.3593 
5 HOUSAM68571374 0.3649 
1 HOUSAM69708729 0.4199 
8 HOUSAM69756113 0.365 
6 HOUSAM69990251 0.3887 

如果任一文件中確實包含了頭,那麼你可以只用grep出來的「排序」前:

$ cat file1 
ranks list-of-ids values 
    1 HOUSAM69708729 0.4468 
    2 HOCANM106363549 0.4434 
    3 HOCANM10845509 0.4268 
    4 HOCANM11098662 0.4203 
    5 HOUSAM68571374 0.3896 
    6 HOUSAM69990251 0.3895 
    7 HONLDM716072164 0.3893 
    8 HOUSAM69756113 0.3656 
    9 HOCANM11098658 0.3593 
    10 HOUSAM66626020 0.3538 
$ cat file2 
list-of-ids values 
HOCANM106363549 0.4832 
HOUSAM69708729 0.4199 
HOCANM10845509 0.4143 
HOUSAM69990251 0.3887 
HOCANM11098662 0.3792 
HOUSAM69756113 0.365 
HOUSAM68571374 0.3649 
HONLDM716072164 0.3600 
HOUSAM66626020 0.3593 
HOCANM11098658 0.3545 
$ join -1 2 -2 1 -o 1.1,2.1,2.2 <(grep -v "list-of-ids" file1 | sort -k 2) <(grep -v "list-of-ids" file2 | sort -k 1) 
2 HOCANM106363549 0.4832 
3 HOCANM10845509 0.4143 
9 HOCANM11098658 0.3545 
4 HOCANM11098662 0.3792 
7 HONLDM716072164 0.3600 
10 HOUSAM66626020 0.3593 
5 HOUSAM68571374 0.3649 
1 HOUSAM69708729 0.4199 
8 HOUSAM69756113 0.365 
6 HOUSAM69990251 0.3887 
相關問題