2014-10-03 71 views
0

如何通過由長字符串組成的列對R數據進行排序?下面的例子說明我的問題:使用長字符串按列排序R數據幀

> a = matrix(NA, nrow=4, ncol=3) 
> a[,1] = c(1,2,3,4) 
> a[,2] = c("gene001_10M","gene002_10M","gene001_50M","gene002_50M") 
> colnames(a) = c("value","sortkey","other") 
> a = as.data.frame(a) 
> a 
    value  sortkey other 
1  1 gene001_10M <NA> 
2  2 gene002_10M <NA> 
3  3 gene001_50M <NA> 
4  4 gene002_50M <NA> 

當我排序的「A」,現在,則SORTKEY似乎是從右至左讀,留下「A」不變:

> b = a[sort(a$sortkey),] 
> b 
    value  sortkey other 
1  1 gene001_10M <NA> 
2  2 gene002_10M <NA> 
3  3 gene001_50M <NA> 
4  4 gene002_50M <NA> 

我的目標,然而,就是:

> b 
    value  sortkey other 
1  1 gene001_10M <NA> 
3  3 gene001_50M <NA> 
2  2 gene002_10M <NA> 
4  4 gene002_50M <NA> 

回答

0

當你有numbersalphabets等倒不如使用mixedordergtools,但在這裏它的工作原理與order單獨

a[order(as.character(a$sortkey)),] 
    # value  sortkey other 
    #1  1 gene001_10M <NA> 
    #3  3 gene001_50M <NA> 
    #2  2 gene002_10M <NA> 
    #4  4 gene002_50M <NA> 

此外,使用sort將讓你的values代替index

sort(as.character(a$sortkey)) 
    #[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M" 

或者否則,您必須指定index.return=TRUE這是默認FALSEsort

sort(as.character(a$sortkey), index.return=TRUE) 
    #$x 
    #[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M" 

    #$ix 
    #[1] 1 3 2 4 

然後,使用

a[sort(as.character(a$sortkey), index.return=TRUE)$ix,] 
    # value  sortkey other 
    #1  1 gene001_10M <NA> 
    #3  3 gene001_50M <NA> 
    #2  2 gene002_10M <NA> 
    #4  4 gene002_50M <NA> 

此外,

library(gtools) 
    mixedorder(as.character(a$sortkey)) 
    #[1] 1 3 2 4 
+0

好極了!謝謝。 – 2014-10-03 11:18:59

0

您還可以使用ordergsub正則表達式預先除去字母

a[order(gsub("[a-zA-Z]+", "", a$sortkey)),] 
# value  sortkey other 
# 1  1 gene001_10M <NA> 
# 3  3 gene001_50M <NA> 
# 2  2 gene002_10M <NA> 
# 4  4 gene002_50M <NA>