2017-01-06 91 views
3

我想要找出第一條記錄與由Id1分組的其餘3條記錄之間的距離。例如,(33.6008949,-83.83803)與(33.604248,-83.86729)之間的距離; (33.6008949,-83.83803)與(33.60586,-83.8711)和 (33.6008949,-83.83803)與(33.6043777,-83.86624)之間的距離之間的距離。查找組中第一行和其餘行之間的距離

Row Id1  Id2   StartTime    StopTime    Latitude Longitude DateTime   
    1  71  34   2016-11-21 00:41:05 UTC 2016-11-21 00:47:06 UTC 33.6008949 -83.83803 2016-11-21 00:43:42 UTC  
    2  71  44   2016-11-21 00:54:55 UTC 2016-11-21 00:56:28 UTC 33.604248  -83.86729 2016-11-21 00:55:18 UTC  
    3  71  45   2016-11-21 02:08:17 UTC 2016-11-21 02:09:52 UTC 33.60586  -83.8711 2016-11-21 02:09:03 UTC  
    4  71  67   2016-11-21 02:16:02 UTC 2016-11-21 02:17:21 UTC 33.6043777 -83.86624 2016-11-21 02:16:28 UTC 

對於點之間的距離計算中,我使用了以下功能:

from math import radians, cos, sin, asin, sqrt 
    def haversine(lon1, lat1, lon2, lat2): 
     lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2]) 
     dlon = lon2 - lon1 
     dlat = lat2 - lat1 
     a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2 
     c = 2 * asin(sqrt(a)) 
     km = 6367 * c 
     return km 
+0

應該用於排序的列? –

+0

@vkp:Dataframe已按StartTime排序。所以我必須找到第一條記錄和組中所有其他記錄之間的距離。 – user3447653

回答

4

嘗試以下

#standardSQL 
CREATE TEMPORARY FUNCTION distance(lat1 FLOAT64, lon1 FLOAT64, lat2 FLOAT64, lon2 FLOAT64) 
RETURNS FLOAT64 AS ((
WITH constants AS (
    SELECT 0.017453292519943295 AS p 
) 
SELECT 12742 * ASIN(SQRT(
    0.5 - COS((lat2 - lat1) * p)/2 + 
    COS(lat1 * p) * COS(lat2 * p) * 
    (1 - COS((lon2 - lon1) * p))/2)) 
FROM constants 
)); 

WITH YourTable AS (
    SELECT 1 AS Row, 71 AS Id1, 34 AS Id2, '2016-11-21 00:41:05 UTC' AS StartTime, '2016-11-21 00:47:06 UTC' AS StopTime, 33.6008949 AS Latitude, -83.83803 AS Longitude, '2016-11-21 00:43:42 UTC' AS DateTime UNION ALL  
    SELECT 2 AS Row, 71 AS Id1, 44 AS Id2, '2016-11-21 00:54:55 UTC' AS StartTime, '2016-11-21 00:56:28 UTC' AS StopTime, 33.604248 AS Latitude, -83.86729 AS Longitude, '2016-11-21 00:55:18 UTC' AS DateTime UNION ALL  
    SELECT 3 AS Row, 71 AS Id1, 45 AS Id2, '2016-11-21 02:08:17 UTC' AS StartTime, '2016-11-21 02:09:52 UTC' AS StopTime, 33.60586 AS Latitude, -83.8711 AS Longitude, '2016-11-21 02:09:03 UTC' AS DateTime UNION ALL  
    SELECT 4 AS Row, 71 AS Id1, 67 AS Id2, '2016-11-21 02:16:02 UTC' AS StartTime, '2016-11-21 02:17:21 UTC' AS StopTime, 33.6043777 AS Latitude, -83.86624 AS Longitude, '2016-11-21 02:16:28 UTC' AS DateTime 
) 
SELECT *, 
    distance(Latitude, Longitude, FIRST_VALUE(Latitude) OVER(PARTITION BY Id1 ORDER BY Id2), FIRST_VALUE(Longitude) OVER(PARTITION BY Id1 ORDER BY Id2)) AS dist 
FROM YourTable 
ORDER BY Id1, Id2 
+0

這應該在BigQuery控制檯中執行。我從筆記本上試過,但是它在創建臨時函數時會引發錯誤。 – user3447653

+0

以上測試 - 從Web UI開始 –