2017-04-10 35 views
0

日期的差別我有一個數據集,看起來像這樣:組由ID並獲得R中

id Type Sale  SaleDate Time Cat  LoadType  LoadDate 
A11 ABC 123 15/11/2016 00:00 AAA  Unload 23/11/2016 
A11 ABC 123 15/11/2016 00:00 AAA   Load 17/11/2016 
A556 ABC 444 09/01/2017 00:00 VVV  Unload 17/01/2017 
A556 ABC 444 09/01/2017 00:00 VVV   Load 17/01/2017 

我想每個ID LoadDate之間的差異。例如它應該返回

id .... LoadDate DifferenceInDays 
A11 .... 23/11/2016  6 
A11 .... 17/11/2016  6 

對於具有相同ID的兩行,DifferenceInDays應該相同。

回答

2

你可以按id,然後計算max(LoadDate) - min(LoadDate)。假設你的數據幀被命名爲myData

library(dplyr) 
    myData %>% 
    mutate(SaleDate = as.Date(SaleDate, "%d/%m/%Y"), 
     LoadDate = as.Date(LoadDate, "%d/%m/%Y")) %>% 
    group_by(id) %>% 
    summarise(DifferenceInDays = max(LoadDate) - min(LoadDate)) 

結果:

 id DifferenceInDays 
    <chr>    <time> 
1 A11    6 days 
2 A556    0 days 

使用mutate()而不是summarise(),如果你想將列添加到原始數據幀。

+0

我得到一個NA的所有值「差異」 – bytebiscuit

+0

代碼對我的作品與您的示例數據。確保在創建數據框時使用'stringsAsFactors = FALSE'。 – neilfws

+0

是的。他們都是'chr's。沒有fctrs。即使Sample.Date突變時也會被NAs所填充 – bytebiscuit

1

我會data.table做到這一點:

require('data.table') 

# Your example data, in a data.frame 
df = read.table(text='id Type Sale SaleDate Time Cat  LoadType LoadDate 
A11 ABC 123 15/11/2016 00:00 AAA  Unload 23/11/2016 
A11 ABC 123 15/11/2016 00:00 AAA  Load 17/11/2016 
A556 ABC 444 09/01/2017 00:00 VVV  Unload 17/01/2017 
A556 ABC 444 09/01/2017 00:00 VVV  Load 17/01/2017', header=T) 

# convert to a data.table... 
dt = data.table(df, key='id') 

# ... with the right format for the date 
dt[, LoadDate := as.IDate(LoadDate, format='%d/%m/%Y')] 

# computes the difference in days, by ID: 
dt[, DifferenceInDays := diff(range(LoadDate)), by=id] 

這使所需的輸出:在生成的數據幀

> dt 
    id Type Sale SaleDate Time Cat LoadType LoadDate DifferenceInDays 
1: A11 ABC 123 15/11/2016 00:00 AAA Unload 2016-11-23    6 
2: A11 ABC 123 15/11/2016 00:00 AAA  Load 2016-11-17    6 
3: A556 ABC 444 09/01/2017 00:00 VVV Unload 2017-01-17    0 
4: A556 ABC 444 09/01/2017 00:00 VVV  Load 2017-01-17    0