我有一個包含經/緯度列表的數據框座標:矢量化在大熊貓的功能
d = {'Provider ID': {0: '10001',
1: '10005',
2: '10006',
3: '10007',
4: '10008',
5: '10011',
6: '10012',
7: '10016',
8: '10018',
9: '10019'},
'latitude': {0: '31.215379379000467',
1: '34.22133455500045',
2: '34.795039606000444',
3: '31.292159523000464',
4: '31.69311635000048',
5: '33.595265517000485',
6: '34.44060759100046',
7: '33.254429322000476',
8: '33.50314015000049',
9: '34.74643089500046'},
'longitude': {0: ' -85.36146587999968',
1: ' -86.15937514799964',
2: ' -87.68507485299966',
3: ' -86.25539902199966',
4: ' -86.26549483099967',
5: ' -86.66531866799966',
6: ' -85.75726760699968',
7: ' -86.81407933399964',
8: ' -86.80242858299965',
9: ' -87.69893502799965'}}
df = pd.DataFrame(d)
我的目標是使用半正矢函數計算出每個項目之間的距離在KM:
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
# 6367 km is the radius of the Earth
km = 6367 * c
return km
我的目標是獲得一個數據幀,看起來像下面其中的值是每個供應商ID之間的距離result_df:
result_df = pd.DataFrame(columns = df['Provider ID'], index=df['Provider ID'])
我可以在循環中做到這一點,但速度非常慢。我在這個轉換爲量化的方法尋找一些幫助:
for first_hospital_coordinates in result_df.columns:
for second_hospital_coordinates in result_df['Provider ID']:
if first_hospital_coordinates == 'Provider ID':
pass
else:
L1 = df[df['Provider ID'] == first_hospital_coordinates]['latitude'].astype('float64').values
O1 = df[df['Provider ID'] == first_hospital_coordinates]['longitude'].astype('float64').values
L2 = df[df['Provider ID'] == second_hospital_coordinates]['latitude'].astype('float64').values
O2 = df[df['Provider ID'] == second_hospital_coordinates]['longitude'].astype('float64').values
distance = haversine(O1, L1, O2, L2)
crit = result_df['Provider ID'] == second_hospital_coordinates
result_df.loc[crit, first_hospital_coordinates] = distance
我回答了類似的問題:HTTP://stackoverflow.com/questions/25767596/using-haversine-formula-with-data-stored-在熊貓數據框/ 25767765#25767765 – EdChum
關閉,haversine方面是相同的。這是以有效的方式創建10x10矩陣是主要的問題,儘管 – DataSwede