2013-12-15 34 views
1
區別

我有這樣基於R找到兩次訪問

vehicleId visitDate taskName 
123  1/1/2013 Change Battery 
456  1/1/2013 Wiper Blades Changed 
123  1/2/2013 Tire Pressure Check 
123  1/3/2013 Tire Rotation 
456  3/1/2013 Tire Pressure Check 

我想做的數據幀

vehicleId visitDate timeBetweenVisits(hrs) 
123  1/1/2013      24 
123  1/2/2013     672 
456  1/1/2013      48 

任何想法我怎麼可以,使用R做到這一點?

+0

無效的東西在我的數據集 – user3056186

回答

1

隨着在@德克的回答res,這裏是一個by表達,沒有工作:

by(res, res$vehicleId, FUN=function(d) 
         { 
         data.frame(vehicleId=head(d$vehicleId, -1), 
            visitDate=head(d$visitDate, -1), 
            tbv=diff(d$visitDate)) 
         } 
) 
## res$vehicleId: 123 
## vehicleId visitDate tbv 
## 1  123 2013-01-01 1 days 
## 2  123 2013-01-02 1 days 
## ---------------------------------------------------------------------------------------------- 
## res$vehicleId: 456 
## vehicleId visitDate  tbv 
## 1  456 2013-01-01 59 days 
+0

嗨馬修,這工作得很好。對於R來說,我只有幾天的時間,並且驚訝於這可以通過幾行代碼來完成。我必須閱讀很多關於你在這裏做了什麼..謝謝 – user3056186

1

加載和轉換數據:

## data now comma-separated as you have fields containing whitespace 
R> res <- read.csv(text=" 
vehicleId, visitDate, taskName 
123,  1/1/2013, Change Battery 
456,  1/1/2013, Wiper Blades Changed 
123,  1/2/2013, Tire Pressure Check 
123,  1/3/2013, Tire Rotation 
456,  3/1/2013, Tire Pressure Check", stringsAsFactors=FALSE) 
R> res$visitDate <- as.Date(res$visitDate, "%m/%d/%Y")  ## now in Daye format 

看看:

R> res 
    vehicleId visitDate    taskName 
1  123 2013-01-01   Change Battery 
2  456 2013-01-01 Wiper Blades Changed 
3  123 2013-01-02 Tire Pressure Check 
4  123 2013-01-03   Tire Rotation 
5  456 2013-03-01 Tire Pressure Check 
R> 

日期Calcs(計算):

R> res[3,"visitDate"] - res[1,"visitDate"] 
Time difference of 1 days 
R> as.numeric(res[3,"visitDate"] - res[1,"visitDate"]) 
[1] 1 
R> difftime(res[3,"visitDate"],res[1,"visitDate"], unit="hours") 
Time difference of 24 hours 
R> as.numeric(difftime(res[3,"visitDate"],res[1,"visitDate"], unit="hours")) 
[1] 24 
R> 

矢量化:

R> as.numeric(difftime(res[2:nrow(res),"visitDate"], 
+      res[1:(nrow(res)-1),"visitDate"], unit="hours")) 
[1] 0 24 24 1368 
R> 

你當然也可以指定一個新的列。您也可以通過車輛編號來進行子集劃分。

+1

這是工作,但我怎麼做到這一點的所有行?我能夠對vehicleId&visitDate進行排序,但不知道如何將您的計算應用於整個數據集。其近100萬條記錄 – user3056186

+0

「R介紹」及其他文件可能對您有所幫助。 –