2016-06-28 52 views
0

我想按dataframe2列的順序重新排列dataframe1根據另一個文件中的第一列對文件中的數據進行排序

Dataframe1

 N02_M N05_F N06_M N07_F N08_F N09_M N02_M N026_F N03_M 
2237895 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 
586  0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 
2255280 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 
7294280 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 
7499 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 
35209 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 

這裏是數據幀2:

Dataframe2

586 
2237895 
7499 
35209 
2255280 
7294280 

結果:

 N02_M N05_F N06_M N07_F N08_F N09_M N02_M N026_F N03_M 
586  0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 
2237895 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 
7499 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 
35209 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 
2255280 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 
7294280 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 

我試着命令awk的

'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' df2 df1 

,但它不工作,它不會改變的df2

+0

你的問題有點含糊。可以解釋一點點?在結果中,「586」如何放置在第一行? – 989

+0

結果中的順序對我來說看起來是隨機的 – Tung

+2

我不明白你是如何得到結果的......結果顯示了同樣的6行,但順序似乎不是來自第2個數據幀...後者只有2個值出現在第一個表格中(586和35209)。 –

回答

1
$ cat tst.awk 
NR==FNR { 
    if (NR==1) { 
     print 
    } 
    else { 
     map[$1] = $0 
    } 
    next 
} 
{ print map[$1] } 

$ awk -f tst.awk dataframe1 dataframe2 
     N02_M N05_F N06_M N07_F N08_F N09_M N02_M N026_F N03_M 
586  0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 
2237895 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 
7499 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 
35209 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 
2255280 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 
7294280 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 
0

順序下面是用dplyr

library(dplyr) 
df1$id <- rownames(df1) # create common column "id" based on rownames 

names(df2) <- "id" #change column name of df2 to common name "id" 
df2$id <- as.character(df2$id) #ensure that the structure for df1$id and df2$id are in similar structure 


left_join(df2, df1) 

     id N02_M N05_F N06_M N07_F N08_F N09_M N02_M.1 N026_F N03_M 
1 34387  NA  NA  NA  NA  NA  NA  NA  NA  NA 
2 202783  NA  NA  NA  NA  NA  NA  NA  NA  NA 
3 887320  NA  NA  NA  NA  NA  NA  NA  NA  NA 
4 35444  NA  NA  NA  NA  NA  NA  NA  NA  NA 
5 88035  NA  NA  NA  NA  NA  NA  NA  NA  NA 
6 239862  NA  NA  NA  NA  NA  NA  NA  NA  NA 
7 9266359  NA  NA  NA  NA  NA  NA  NA  NA  NA 
8  586 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 
9 375535  NA  NA  NA  NA  NA  NA  NA  NA  NA 
10 3794260  NA  NA  NA  NA  NA  NA  NA  NA  NA 
11 6687758  NA  NA  NA  NA  NA  NA  NA  NA  NA 
12 6983267  NA  NA  NA  NA  NA  NA  NA  NA  NA 
13 704346  NA  NA  NA  NA  NA  NA  NA  NA  NA 
14 35209 0.9400 0.9400 0.9400 0.9400 0.9400 0.9400 0.9400 0.9400 0.9400 
15 795668  NA  NA  NA  NA  NA  NA  NA  NA  NA 

解決順便說一句,df1 == Dataframe1和df2 == Dataframe2

0

讓我們準備好數據。

Dataframe1 <- cbind(a=rnorm(6), b=rnorm(6)) 
rownames(Dataframe1) <- round(rnorm(6)*1000000) 
Dataframe2 <- cbind(sample(as.integer(rownames(Dataframe1)))) 

Dataframe1:

    a   b 
255728 -0.2063562 -0.804514251 
215067 0.2756317 -0.865951064 
2079729 1.1376277 0.001123908 
-898713 0.1846505 -0.908527352 
-717953 1.7424731 0.462088593 
-1028794 0.4162211 1.548455286 

Dataframe2:

  [,1] 
[1,] -1028794 
[2,] -898713 
[3,] -717953 
[4,] 215067 
[5,] 2079729 
[6,] 25572 

而現在......戲法!

Dataframe1[order(rownames(Dataframe1))[rank(Dataframe2)],] 
        a   b 
-1028794 0.4162211 1.548455286 
2079729 1.1376277 0.001123908 
215067 0.2756317 -0.865951064 
255728 -0.2063562 -0.804514251 
-898713 0.1846505 -0.908527352 
-717953 1.7424731 0.462088593 

這種排序的想法是使用order(x)[rank(y)]

相關問題