我試圖做一些快速和骯髒的反向地理編碼。根據cKDTree索引從熊貓數據框中選擇行
我有數據幀poi
(約50,000行),其中每個興趣點都有一個緯度/經度座標。
我也有數據幀postcode_existing
(大約180,000行),它將緯度/經度座標映射到郵政編碼。
我拿出相關的座標列,並使用cKDTree爲poi
中的每個感興趣點確定postcode_existing
中最近的緯度/經度座標。
import pandas as pd
import numpy as np
from scipy.spatial import cKDTree
# read poi and postcode csv files
# Extract subset
postcode_existing_coordinates = postcode_existing[['Latitude', 'Longitude']]
# Extract subset
poi_coordinates = poi[['Latitude', 'Longitude']]
# Construct tree
tree = cKDTree(postcode_existing_coordinates)
# Query
distances, indices = tree.query(poi_coordinates)
我結束了相關指數。我現在正在尋找使用這些索引從數據框postcode_existing
中選擇行。
我試過postcode_existing.ix[indices]
,但這似乎沒有得到正確的行。
例如:
>>> postcode_existing.ix[indices].head()
Postcode Latitude Longitude Easting Northing GridRef \
78579 HA3 0NS 51.57553 -0.304296 517605.0 187658.0 TQ176876
178499 NaN NaN NaN NaN NaN NaN
62392 NaN NaN NaN NaN NaN NaN
78662 HA3 0TA 51.58409 -0.288764 518659.0 188635.0 TQ186886
79470 NaN NaN NaN NaN NaN NaN
County District Ward DistrictCode ... Terminated \
78579 Greater London Brent Kenton E09000005 ... NaN
178499 NaN NaN NaN NaN ... NaN
62392 NaN NaN NaN NaN ... NaN
78662 Greater London Brent Kenton E09000005 ... NaN
79470 NaN NaN NaN NaN ... NaN
Parish NationalPark Population Households Built up area \
78579 NaN NaN 72.0 25.0 Greater London
178499 NaN NaN NaN NaN NaN
62392 NaN NaN NaN NaN NaN
78662 NaN NaN 152.0 39.0 Greater London
79470 NaN NaN NaN NaN NaN
Built up sub-division Lower layer super output area \
78579 Brent Brent 004D
178499 NaN NaN
62392 NaN NaN
78662 Brent Brent 003E
79470 NaN NaN
Rural/urban Region
78579 Urban major conurbation London
178499 NaN NaN
62392 NaN NaN
78662 Urban major conurbation London
79470 NaN NaN
[5 rows x 25 columns]
但是:
>>> postcode_existing.iloc[78579]
Postcode NW1 3AU
Latitude 51.5237
Longitude -0.143188
Easting 528915
Northing 182163
GridRef TQ289821
County Greater London
District Westminster
Ward Marylebone High Street
DistrictCode E09000033
WardCode E05000641
Country England
CountyCode E11000009
Constituency Cities of London and Westminster
Introduced 1980-01-01
Terminated NaN
Parish NaN
NationalPark NaN
Population 7
Households 1
Built up area Greater London
Built up sub-division City of Westminster
Lower layer super output area Westminster 013A
Rural/urban Urban major conurbation
Region London
Name: 133733, dtype: object
另外:
>>> postcode_existing.iloc[178499]
Postcode WC1E 6JL
Latitude 51.5236
Longitude -0.135522
Easting 529447
Northing 182168
GridRef TQ294821
County Greater London
District Camden
Ward Bloomsbury
DistrictCode E09000007
WardCode E05000129
Country England
CountyCode E11000009
Constituency Holborn and St Pancras
Introduced 1980-01-01
Terminated NaN
Parish NaN
NationalPark NaN
Population 1
Households 1
Built up area Greater London
Built up sub-division Camden
Lower layer super output area Camden 026D
Rural/urban Urban major conurbation
Region London
Name: 307029, dtype: object
這似乎是正確的。
爲什麼postcode_existing.ix[indices]
沒有選擇正確的行?我應該用什麼來代替?
我得到'loc'同樣的問題: '>>> postcode_existing.loc [指數]。頭() 郵編緯度經度東座標北向GridRef \ 78579 HA3爲0ns 51.57553 -0.304296 517605.0 187658.0 TQ176876' –
'loc'是基於標籤的。它拉動了索引爲「78579」的那一行。 'iloc'是基於位置的,並且將拉動位置爲'78579'的行。沒有您的數據樣本,我無法驗證或驗證任何內容。我假設'tree.query(poi_coordinates)'返回的'indices'對象是對標籤的引用。因此,你應該使用'loc'。如果你說這是錯誤的,我不知道,因爲我沒有你的數據。 – piRSquared