2015-12-17 43 views
0

我創建了一個數據集來說明我擁有的問題。R - 排序半數字列

我的數據是這樣的

id  time act 
1 1  time1 a 
2 1  time2 a 
3 1  time3 a 
4 1 time101 a 
5 1 time103 a 
6 1 time1001 b 
7 1 time1003 b 
9 1 time10000 b 
10 1 time100010 c 

我想是spread以正確的順序與time的數據,這樣的:

id 1 2 3 101 103 1001 1003 1004 10000 100010 
    1 a a a a a b b b  b  c 

這裏是什麼,我不完全理解。當我spread我的數據我得到類似

library(dplyr) 
library(tidyr) 

dt %>% spread(time, act) 

    id time1 time10000 time100010 time1001 time1003 time1004 time101 time103 time2 time3 
1 1  a   b   c  b  b  b  a  a  a  a 

所以R似乎認識到這樣一些數字順序排列的,但認爲time10000是之前23

這是爲什麼?我可以解決這個問題。

我想是這樣的:

id time1 time2 time3 time101 time103 time1001 time1003 time1004 time10000 time100010 
1 1  a  a  a  a  a  b  b  b   b   c 

數據

dt = structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
    time = structure(c(1L, 9L, 10L, 7L, 8L, 4L, 5L, 6L, 2L, 3L 
     ), .Label = c("time1", "time10000", "time100010", "time1001", 
    "time1003", "time1004", "time101", "time103", "time2", "time3" 
    ), class = "factor"), act = structure(c(1L, 1L, 1L, 1L, 1L, 
    2L, 2L, 2L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor")), .Names = c("id", 
"time", "act"), class = "data.frame", row.names = c(NA, -10L)) 

回答

4

重新排序因子水平:

> dt$time<-factor(dt$time, as.character(dt$time)) 
> dt %>% spread(time, act) 
    id time1 time2 time3 time101 time103 time1001 time1003 time1004 time10000 
1 1  a  a  a  a  a  b  b  b   b 
    time100010 
1   c