2012-12-21 87 views
4

我有一個數據框,裏面有關於駕駛員的數據和他們關注的路線。我試圖找出旅行總里程。我正在使用geosphere包,但找不到正確的方式來應用它並以英里得到答案。計算來自經緯度向量的總里程數

> head(df1) 
    id  routeDateTime driverId  lat  lon 
1 1 2012-11-12 02:08:41  123 76.57169 -110.8070 
2 2 2012-11-12 02:09:41  123 76.44325 -110.7525 
3 3 2012-11-12 02:10:41  123 76.90897 -110.8613 
4 4 2012-11-12 03:18:41  123 76.11152 -110.2037 
5 5 2012-11-12 03:19:41  123 76.29013 -110.3838 
6 6 2012-11-12 03:20:41  123 76.15544 -110.4506 

到目前爲止,我已經試過

spDists(cbind(df1$lon,df1$lat)) 

和其他一些功能,但似乎無法得到一個合理的答案。

有什麼建議嗎?

> dput(df1) 
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), routeDateTime = c("2012-11-12 02:08:41", 
"2012-11-12 02:09:41", "2012-11-12 02:10:41", "2012-11-12 03:18:41", 
"2012-11-12 03:19:41", "2012-11-12 03:20:41", "2012-11-12 03:21:41", 
"2012-11-12 12:08:41", "2012-11-12 12:09:41", "2012-11-12 12:10:41", 
"2012-11-12 02:08:41", "2012-11-12 02:09:41", "2012-11-12 02:10:41", 
"2012-11-12 03:18:41", "2012-11-12 03:19:41", "2012-11-12 03:20:41", 
"2012-11-12 03:21:41", "2012-11-12 12:08:41", "2012-11-12 12:09:41", 
"2012-11-12 12:10:41", "2012-11-12 02:08:41", "2012-11-12 02:09:41", 
"2012-11-12 02:10:41", "2012-11-12 03:18:41", "2012-11-12 03:19:41", 
"2012-11-12 03:20:41", "2012-11-12 03:21:41", "2012-11-12 12:08:41", 
"2012-11-12 12:09:41", "2012-11-12 12:10:41", "2012-11-12 02:08:41", 
"2012-11-12 02:09:41", "2012-11-12 02:10:41", "2012-11-12 03:18:41", 
"2012-11-12 03:19:41", "2012-11-12 03:20:41", "2012-11-12 03:21:41", 
"2012-11-12 12:08:41", "2012-11-12 12:09:41", "2012-11-12 12:10:41" 
), driverId = c(123, 123, 123, 123, 123, 123, 123, 123, 123, 
123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 789, 789, 
789, 789, 789, 789, 789, 789, 789, 789, 246, 246, 246, 246, 246, 
246, 246, 246, 246, 246), lat = c(76.5716897079255, 76.4432530414779, 
76.9089707506355, 76.1115217276383, 76.2901271982118, 76.155437662499, 
76.4115052509587, 76.8397977722343, 76.3357809444424, 76.032417796785, 
76.5716897079255, 76.4432530414779, 76.9089707506355, 76.1115217276383, 
76.2901271982118, 76.155437662499, 76.4115052509587, 76.8397977722343, 
76.3357809444424, 76.032417796785, 76.5716897079255, 76.4432530414779, 
76.9089707506355, 76.1115217276383, 76.2901271982118, 76.155437662499, 
76.4115052509587, 76.8397977722343, 76.3357809444424, 76.032417796785, 
76.5716897079255, 76.4432530414779, 76.9089707506355, 76.1115217276383, 
76.2901271982118, 76.155437662499, 76.4115052509587, 76.8397977722343, 
76.3357809444424, 76.032417796785), lon = c(-110.80701574916, 
-110.75247172825, -110.861284852726, -110.203674311982, -110.383751512505, 
-110.450569844106, -110.22185564111, -110.556956546381, -110.24483308522, 
-110.217355202651, -110.80701574916, -110.75247172825, -110.861284852726, 
-110.203674311982, -110.383751512505, -110.450569844106, -110.22185564111, 
-110.556956546381, -110.24483308522, -110.217355202651, -110.80701574916, 
-110.75247172825, -110.861284852726, -110.203674311982, -110.383751512505, 
-110.450569844106, -110.22185564111, -110.556956546381, -110.24483308522, 
-110.217355202651, -110.80701574916, -110.75247172825, -110.861284852726, 
-110.203674311982, -110.383751512505, -110.450569844106, -110.22185564111, 
-110.556956546381, -110.24483308522, -110.217355202651)), .Names = c("id", 
"routeDateTime", "driverId", "lat", "lon"), row.names = c(NA, 
-40L), class = "data.frame") 

回答

6

這個怎麼樣?

## Setup 
library(geosphere) 
metersPerMile <- 1609.34 
pts <- df1[c("lon", "lat")] 

## Pass in two derived data.frames that are lagged by one point 
segDists <- distVincentyEllipsoid(p1 = pts[-nrow(df),], 
            p2 = pts[-1,]) 
sum(segDists)/metersPerMile 
# [1] 1013.919 

(要使用的更快的距離計算算法中的一個,只是替代distCosinedistVincentySphere,或distHaversinedistVincentyEllipsoid在呼叫的上方。)

+0

真棒。非常感謝你。 – screechOwl

1

非常小心丟失的數據,作爲distVincentyEllipsoid()返回0表示缺失座標c(NA,NA),c(NA,NA)的任意兩點之間的距離。