2013-04-30 47 views
0

我有一個大的時間序列(以數據幀形式)(N => 6000),看起來像這樣:彼此分開的列

   time, precip 

1 2005-09-30 11:45:00, 0.08 
2 2005-09-30 23:45:00, 0.72 
3 2005-10-01 11:45:00, 0.01 
4 2005-10-01 23:45:00, 0.08 
5 2005-10-02 11:45:00, 0.10 
6 2005-10-02 23:45:00, 0.33 
7 2005-10-03 11:45:00, 0.15 
8 2005-10-03 23:45:00, 0.30 
9 2005-10-04 11:45:00, 0.00 
10 2005-10-04 23:45:00, 0.00 
11 2005-10-05 11:45:00, 0.02 
12 2005-10-05 23:45:00, 0.00 
13 2005-10-06 11:45:00, 0.00 
14 2005-10-06 23:45:00, 0.01 
15 2005-10-07 11:45:00, 0.00 
16 2005-10-07 23:45:00, 0.00 
17 2005-10-08 11:45:00, 0.00 
18 2005-10-08 23:45:00, 0.16 
19 2005-10-09 11:45:00, 0.03 
20 2005-10-09 23:45:00, 0.00 

每一行具有時間(YYYY- MM-DD HH:MM:SS,12小時時間序列)和降水量。我想通過風暴事件分開數據。

我想要做的是這樣的: 1)將呼籲每一組數量值由0的分離「風暴」 2)新列,把它叫做一個風暴。

例如...

   Time,  Precip, Storm 

1 2005-09-30 11:45:00, 0.08, 1 
2 2005-09-30 23:45:00, 0.72, 1 
3 2005-10-01 11:45:00, 0.01, 1 
4 2005-10-01 23:45:00, 0.08, 1 
5 2005-10-02 11:45:00, 0.10, 1 
6 2005-10-02 23:45:00, 0.33, 1 
7 2005-10-03 11:45:00, 0.15, 1 
8 2005-10-03 23:45:00, 0.30, 1 
9 2005-10-04 11:45:00, 0.00 
10 2005-10-04 23:45:00, 0.00 
11 2005-10-05 11:45:00, 0.02, 2 
12 2005-10-05 23:45:00, 0.00 
13 2005-10-06 11:45:00, 0.00 
14 2005-10-06 23:45:00, 0.01, 3 
15 2005-10-07 11:45:00, 0.00 
16 2005-10-07 23:45:00, 0.00 
17 2005-10-08 11:45:00, 0.00 
18 2005-10-08 23:45:00, 0.16, 4 
19 2005-10-09 11:45:00, 0.03, 4 
20 2005-10-09 23:45:00, 0.00 

4)在那之後,我的計劃是由風暴事件子集的數據。

我對R很新,所以不要害怕指出明顯。非常感謝您的幫助!

回答

4

你可以找到一個風暴中的事件,然後使用rle和修改結果

# assuming your data is called rainfall 
# identify whether a precipitation has been recorded at each timepoint 
rainfall$storm <- rainfall$precip > 0 
# do run length encoding on this storm indicator 
storms < rle(rainfall$storms) 
# set the FALSE values to NA 
is.na(storms$values) <- !storms$values 
# replace the TRUE values with a number in seqence 
storms$values[which(storms$values)] <- seq_len(sum(storms$values, na.rm = TRUE)) 
# use inverse.rle to revert to the full length column 
rainfall$stormNumber <- inverse.rle(storms) 
+0

這工作就像一個魅力!謝謝! – user2263130 2013-05-01 02:10:02

2

假設此輸入:

Lines <- "time, precip 
1 2005-09-30 11:45:00, 0.08 
2 2005-09-30 23:45:00, 0.72 
3 2005-10-01 11:45:00, 0.01 
4 2005-10-01 23:45:00, 0.08 
5 2005-10-02 11:45:00, 0.10 
6 2005-10-02 23:45:00, 0.33 
7 2005-10-03 11:45:00, 0.15 
8 2005-10-03 23:45:00, 0.30 
9 2005-10-04 11:45:00, 0.00 
10 2005-10-04 23:45:00, 0.00 
11 2005-10-05 11:45:00, 0.02 
12 2005-10-05 23:45:00, 0.00 
13 2005-10-06 11:45:00, 0.00 
14 2005-10-06 23:45:00, 0.01 
15 2005-10-07 11:45:00, 0.00 
16 2005-10-07 23:45:00, 0.00 
17 2005-10-08 11:45:00, 0.00 
18 2005-10-08 23:45:00, 0.16 
19 2005-10-09 11:45:00, 0.03 
20 2005-10-09 23:45:00, 0.00 
" 

我們在數據讀取,然後創建一個邏輯向量是對於先前值爲零的每個非零降水量爲TRUE。如果z[1]非零,我們預先設置第一個值爲TRUE,如果爲零,則爲FALSE。將cumsum應用於此矢量可在對應於非零precip值的位置給出正確的值。爲了處理位置對應於零個precip值,我們使用replaceempty存儲到他們的價值觀:

# read in data 
library(zoo) 
z <- read.zoo(text = Lines, skip = 1, tz = "", index = 2:3)[, 2] 

# calculate 
e <- NA # empty 
cbind(precip = z, storm = replace(cumsum(c(z[1]!=0, z!=0 & lag(z,-1)==0)), z==0, e)) 

最後一行給出了這樣的:

    precip storm 
2005-09-30 11:45:00 0.08  1 
2005-09-30 23:45:00 0.72  1 
2005-10-01 11:45:00 0.01  1 
2005-10-01 23:45:00 0.08  1 
2005-10-02 11:45:00 0.10  1 
2005-10-02 23:45:00 0.33  1 
2005-10-03 11:45:00 0.15  1 
2005-10-03 23:45:00 0.30  1 
2005-10-04 11:45:00 0.00 NA 
2005-10-04 23:45:00 0.00 NA 
2005-10-05 11:45:00 0.02  2 
2005-10-05 23:45:00 0.00 NA 
2005-10-06 11:45:00 0.00 NA 
2005-10-06 23:45:00 0.01  3 
2005-10-07 11:45:00 0.00 NA 
2005-10-07 23:45:00 0.00 NA 
2005-10-08 11:45:00 0.00 NA 
2005-10-08 23:45:00 0.16  4 
2005-10-09 11:45:00 0.03  4 
2005-10-09 23:45:00 0.00 NA 
+0

這也起作用 - 不幸的是我不能將兩者都標記爲正確。謝謝! – user2263130 2013-05-01 02:10:52