從時間序列數據框的列中提取最高值

我有一個數據框，其中包含2000-2012年的26個站的每月NDVI值。我根據年份首先對數據幀進行了排序，然後是站點，最後是ndvi。從時間序列數據框的列中提取最高值

我的數據框[R看起來是這樣的（抱歉格式）：

t station year month ndvi altitude precipitation 
8 a 2000 aug 0.7793 2143 592.9 
9 a 2000 sept 0.7524 2143 135.3 
10 a 2000 oct 0.6597 2143 77.5 
4 a 2000 apr 0.6029 2143 72.6 
7 a 2000 jul 0.6018 2143 606.1 
11 a 2000 nov 0.5801 2143 4.4 
12 a 2000 dec 0.5228 2143 0 
6 a 2000 jun 0.4969 2143 505.9 
5 a 2000 may 0.4756 2143 241.7 
2 a 2000 feb 0.4396 2143 4 
3 a 2000 mar 0.4393 2143 25.5 
1 a 2000 jan 0.4138 2143 16 
8 b 2000 aug 0.7523 122 832.3 
9 b 2000 sept 0.7003 122 229.7 
7 b 2000 jul 0.667 122 662 
5 b 2000 may 0.6639 122 323.3 
4 b 2000 apr 0.593 122 88.6 
6 b 2000 jun 0.5508 122 752.1

我需要提取前三名NDVI行每個站每年和使用此代碼嘗試：

top3 <- split(R, R$station) 
subsetted.data <- lapply(top3, FUN = function(x) head(x, 3)) 
subsetted.data 
flatten.data <- do.call("rbind", subsetted.data) 
View(flatten.data)

但是，我只在2000年獲得了排名前三的ndvi行數據幀，而不是數年後。

有誰知道我該如何解決這個問題？

謝謝。

來源

2014-03-25 user3460660

請張貼一小部分數據樣本。但是，你可以使用頭（排序（R $站），3）'也許？ –

我同意，一個可重複的例子將幫助大家：http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example –

您需要通過站和今年的相互作用分裂：

R <- R[order(R$ndvi, decreasing=T), ] 
top3 <- split(R, interaction(R$station, R$year)) # <<<<<<<<<< this is the change 
subsetted.data <- lapply(top3, FUN = function(x) head(x, 3)) 
subsetted.data 
flatten.data <- do.call("rbind", subsetted.data)

這工作（見我的數據在年底）。這就是說，這種類型的事情是非常容易使用包處理像data.table：

library(data.table) 
data.table(R)[order(ndvi, decreasing=T), head(.SD, 3), by=list(station, year)]

注意，您可以訂購data.table速度比較快，通過使用密鑰，但我忽略，爲了清楚起見，在這裏。

數據：

set.seed(1) 
R <- expand.grid(year=2000:2010, station=letters[1:5], month=month.abb) 
R$ndvi <- runif(nrow(R))

來源

2014-03-25 17:23:45 BrodieG

非常感謝你！ – user3460660

我在一些任意的「2001」年的堵塞，顯示分離。我比較喜歡order的數據，由感興趣的列第一個，然後split那個。如果您選擇，您可以對結果使用do.call(rbind, ...)。其結果是每年按車站排名前三名的「ndvi」。

> dat$year[c(8:12, 16:18)] <- 2001 ## add some 2001 years 
> ord <- dat[order(-dat$ndvi), ] 
> lapply(split(ord, list(ord$station, ord$year)), head, 3) 
$a.2000 
    t station year month ndvi altitude precipitation 
1 8  a 2000 aug 0.7793  2143   592.9 
2 9  a 2000 sept 0.7524  2143   135.3 
3 10  a 2000 oct 0.6597  2143   77.5 

$b.2000 
    t station year month ndvi altitude precipitation 
13 8  b 2000 aug 0.7523  122   832.3 
14 9  b 2000 sept 0.7003  122   229.7 
15 7  b 2000 jul 0.6670  122   662.0 

$a.2001 
    t station year month ndvi altitude precipitation 
8 6  a 2001 jun 0.4969  2143   505.9 
9 5  a 2001 may 0.4756  2143   241.7 
10 2  a 2001 feb 0.4396  2143   4.0 

$b.2001 
    t station year month ndvi altitude precipitation 
16 5  b 2001 may 0.6639  122   323.3 
17 4  b 2001 apr 0.5930  122   88.6 
18 6  b 2001 jun 0.5508  122   752.1

來源

2014-03-25 17:39:22

謝謝你的幫助！ – user3460660

從時間序列數據框的列中提取最高值

回答

相關問題