2015-12-29 73 views
2

(IPython的筆記本) (總線統計)Python函數中使用的熊貓haversine公式來計算距離

summary.head()

enter image description here

我需要每兩排之間distance_travelled計算,其中 1)row ['sequence']!= 0,因爲當總線在他的初始停止時沒有距離2)row ['track_id'] == previous_row ['track_id']。

我haversine公式定義:

def haversine(lon1, lat1, lon2, lat2): 

     lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2]) 

# haversine formula 
dlon = lon2 - lon1 
dlat = lat2 - lat1 
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2 
c = 2 * asin(sqrt(a)) 
r = 6371 # Radius of earth in kilometers. Use 3956 for miles 
return c * r 

我不完全知道如何去這個問題。其中一個想法是使用itterrows()和應用harvesine()函數,如果行'序列'參數不是0,行的'track_id'等於前一行的'track_id'

[編輯]我認爲沒有需要檢查行和前一行的'track_id'是否相同,因爲haversine()函數僅應用於兩行,並且當sequence = 0時,該行的距離爲== 0,這意味着track_id已更改。因此,基本上,將hasrsine()函數應用於'sequence'!= 0的所有行,即hasrsine(previous_row.lng,previous_row.lat,current_row.lng,current_row.lat)。還需要提供幫助的儘管

[編輯2] 我設法達到與類似的東西:

summary['distance_travelled'] = summary.apply(lambda row: haversine(row['lng'], row['lat'], previous_row['lng'], previous_row['lat']), axis=1) 

其中previous_row實際上應該是previous_row,因爲現在它只是一個佔位符字符串,它什麼都不做。

+1

不是這個這個AA欺騙:http://stackoverflow.com/questions/25767596/using-haversine-formula-with-data-stored-in- A-大熊貓非數據幀/ 25767765#25767765? – EdChum

回答

1

IIUC你可以試試:

print summary 

    track_id sequence  lat  lng distance_travelled 
0  1-1   0 41.041870 29.060010     0 
4  1-1   1 41.040859 29.059980     0 
6  1-1   2 41.039242 29.059731     0 
#create new shifted columns 
summary['latp'] = summary['lat'].shift(1) 
summary['lngp'] = summary['lng'].shift(1) 
print summary 

    track_id sequence  lat  lng distance_travelled  latp \ 
0  1-1   0 41.041870 29.060010     0  NaN 
4  1-1   1 41.040859 29.059980     0 41.041870 
6  1-1   2 41.039242 29.059731     0 41.040859 

     lngp 
0  NaN 
4 29.06001 
6 29.05998 
summary['distance_travelled'] = summary.apply(lambda row: haversine(row['lng'], row['lat'], row['lngp'], row['latp']), axis=1) 
#remove column lngp, latp 
summary = summary.drop(['lngp','latp'], axis=1) 
print summary 

    track_id sequence  lat  lng distance_travelled 
0  1-1   0 41.041870 29.060010     NaN 
4  1-1   1 41.040859 29.059980   0.112446 
6  1-1   2 41.039242 29.059731   0.181011 
+0

如果性能很重要,調用'.apply(haversine,axis = 1)'會比編寫'haversine'來獲取numpy數組並執行'summary ['distance_travelled'] = haversine(summary ['lng'],summary ['lat'],摘要['lngp'],摘要['latp'])' – TomAugspurger

相關問題