更新與上次重複記錄的條目重複更新

我想更新數據框中的第一個副本（相對於標識符變量）的條目，其中包含來自上次副本的信息。在下面的數據中，我希望「begin_date」爲最小值，「end_date」爲該id的最大值，同時只保留唯一的id值。更新與上次重複記錄的條目重複更新

更改此：

data <- data.frame(id=c(1,1,1,2,2,3,3,3,4,4,4,4),begin_date=c(1970,1976,2000,1969,2010,1950,1986,1990,1960,1968,1972,1983),end_date=c(1976,2000,2012,2010,2013,1986,1990,1999,1968,1972,1983,2001))

要這樣：

data <- data.frame(id=c(1,2,3,4),begin_date=c(1970,1969,1950,1960),end_date=c(2012,2013,1999,2001))

來源

2013-05-30 seapen

如果你把你的數據在數據幀，那麼你可以使用plyr的ddply此：

library(plyr) 
data <- ddply(data, .(id), summarize, begin_date=min(begin_date), 
       end_date=max(end_date)) 

## id begin_date end_date 
##1 1  1970  2012 
##2 2  1969  2013 
##3 3  1950  1999 
##4 4  1960  2001

來源

2013-05-30 22:29:43

謝謝，這工作完美！（並且我編輯了我的問題以創建一個data.frame，oops）。 – seapen

你說這是一個data.frame，所以這是我構建的：

dat <- data.frame(id=c(1,1,1,2,2,3,3,3,4,4,4,4), 
        begin_date=c(1970,1976,2000,1969,2010,1950,1986,1990,1960, 1968,1972,1983), 
        end_date=c(1976,2000,2012,2010,2013,1986,1990,1999,1968, 1972,1983,2001)) 

with(dat, data.frame(id=unique(id), 
       begin_date =tapply(begin_date, id, head, 1), 
       end_date= tapply(end_date, id, tail,1)) 
) 

    id begin_date end_date 
1 1  1970  2012 
2 2  1969  2013 
3 3  1950  1999 
4 4  1960  2001

也可以使用最大值和最小值。

來源

2013-05-30 22:31:12

謝謝你的回答...我打算使用ddply和min/max，因爲我覺得它更易於理解。 – seapen

更新與上次重複記錄的條目重複更新

回答

相關問題