2015-10-31 21 views
-1

我有一個數據幀遵循指數分組列

     time site val 

    2014-09-01 00:00:00 2001  1 
    2014-09-01 00:15:00 2001  0 
    2014-09-01 00:30:00 2001  2 
    2014-09-01 00:45:00 2001  0 
    2014-09-01 00:00:00 2002  1 
    2014-09-01 00:15:00 2002  0 
    2014-09-01 00:30:00 2002  2 
    2014-09-02 00:45:00 2001  0 
    2014-09-02 00:00:00 2001  1 
    2014-09-02 00:15:00 2001  0 
    2014-09-02 00:30:00 2001  2 
    2014-09-02 00:45:00 2001  0 
    2014-09-02 00:00:00 2002  1 
    2014-09-02 00:15:00 2002  0 
    2014-09-02 00:30:00 2002  2 
    2014-09-02 00:45:00 2001  0 

我想成爲受時間和場地能集團就再添加一個新的變量,將包括的的發生指數組

    time site val h 

    2014-09-01 00:00:00 2001  1 1 
    2014-09-01 00:15:00 2001  0 2 
    2014-09-01 00:30:00 2001  2 3 
    2014-09-01 00:45:00 2001  0 4 
    2014-09-01 00:00:00 2002  1 1 
    2014-09-01 00:15:00 2002  0 2 
    2014-09-01 00:30:00 2002  2 3 
    2014-09-02 00:45:00 2002  0 4 
    2014-09-02 00:00:00 2001  1 1 
    2014-09-02 00:15:00 2001  0 2 
    2014-09-02 00:30:00 2001  2 3 
    2014-09-02 00:45:00 2001  0 4 
    2014-09-02 00:00:00 2002  1 1 
    2014-09-02 00:15:00 2002  0 2 
    2014-09-02 00:30:00 2002  2 3 
    2014-09-02 00:45:00 2001  0 4 

df <- structure(list(time = structure(c(1409522400, 1409523300, 1409524200, 
1409525100, 1409522400, 1409523300, 1409524200, 1409611500, 1409608800, 
1409609700, 1409610600, 1409611500, 1409608800, 1409609700, 1409610600, 
1409611500), class = c("POSIXct", "POSIXt"), tzone = ""), site = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("2001", 
"2002"), class = "factor"), val = c(1L, 0L, 2L, 0L, 1L, 0L, 2L, 
0L, 1L, 0L, 2L, 0L, 1L, 0L, 2L, 0L)), .Names = c("time", "site", 
"val"), row.names = c(NA, -16L), class = "data.frame") 

什麼是我的R中的可能性,實現這個

感謝

+0

謝謝大家,我忘了解釋我的要求,h列需要重新設置分組列時間和地點。我編輯了我原來的問題 – Sasukethorpido

+0

dataframes你可以'輸入(數據)'? –

+0

增加了輸出輸出 – Sasukethorpido

回答

1

使用dplyr。首先我們創建一個從日期(time列)中提取日期的列id。然後,我們按siteid進行分組,並添加一個新變量counter,計算這兩個組的出現次數。

df$id <- as.factor(format(df$time,'%d')) 
library(dplyr) 
df %>% group_by(site, id) %>% mutate(counter = row_number()) 

輸出:

    time site val  id counter 
       (time) (fctr) (int) (fctr) (int) 
1 2014-09-01 00:00:00 2001  1  01  1 
2 2014-09-01 00:15:00 2001  0  01  2 
3 2014-09-01 00:30:00 2001  2  01  3 
4 2014-09-01 00:45:00 2001  0  01  4 
5 2014-09-01 00:00:00 2002  1  01  1 
6 2014-09-01 00:15:00 2002  0  01  2 
7 2014-09-01 00:30:00 2002  2  01  3 
8 2014-09-02 00:45:00 2001  0  02  1 
9 2014-09-02 00:00:00 2001  1  02  2 
10 2014-09-02 00:15:00 2001  0  02  3 
11 2014-09-02 00:30:00 2001  2  02  4 
12 2014-09-02 00:45:00 2001  0  02  5 
13 2014-09-02 00:00:00 2002  1  02  1 
14 2014-09-02 00:15:00 2002  0  02  2 
15 2014-09-02 00:30:00 2002  2  02  3 
16 2014-09-02 00:45:00 2001  0  02  6 
0

我們可以使用ave

df$h <- with(df, ave(val, cumsum(c(TRUE,diff(time)< 0)), FUN= seq_along)) 
df 
#     time site val h 
#1 2014-09-01 03:30:00 2001 1 1 
#2 2014-09-01 03:45:00 2001 0 2 
#3 2014-09-01 04:00:00 2001 2 3 
#4 2014-09-01 04:15:00 2001 0 4 
#5 2014-09-01 03:30:00 2002 1 1 
#6 2014-09-01 03:45:00 2002 0 2 
#7 2014-09-01 04:00:00 2002 2 3 
#8 2014-09-02 04:15:00 2001 0 4 
#9 2014-09-02 03:30:00 2001 1 1 
#10 2014-09-02 03:45:00 2001 0 2 
#11 2014-09-02 04:00:00 2001 2 3 
#12 2014-09-02 04:15:00 2001 0 4 
#13 2014-09-02 03:30:00 2002 1 1 
#14 2014-09-02 03:45:00 2002 0 2 
#15 2014-09-02 04:00:00 2002 2 3 
#16 2014-09-02 04:15:00 2001 0 4 

注:這是基於在OP的帖子預期的輸出顯示。我知道「網站」也被描述爲分組變量,但是預期的輸出結果應該是別的。