2017-04-16 168 views
1

我有兩個dataframes: 1)包含供應商的名單和他們的緯度,經度座標迭代通過多個dataframes大熊貓

sup_essential = pd.DataFrame({'supplier': ['A','B','C'], 
           'coords': [(51.1235,-0.3453),(52.1245,-0.3423),(53.1235,-1.4553)]}) 

2)存儲列表和它們的緯度,經度座標

stores_essential = pd.DataFrame({'storekey': [1,2,3], 
           'coords': [(54.1235,-0.6553),(49.1245,-1.3423),(50.1235,-1.8553)]}) 

我想創建一個輸出表,其中包含store,store_coordinates,supplier,supplier_coordinates,每個store和supplier的組合距離。

我目前有:

test=[] 
for row in sup_essential.iterrows(): 
    for row in stores_essential.iterrows(): 
     r = sup_essential['supplier'],stores_essential['storeKey'] 
     test.append(r) 

但這只是給了我所有重複值的

+0

請提供小(3-7行)在文本/ CSV格式再現的數據集和所希望的數據集。請閱讀[如何使良好的可重複熊貓示例](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – MaxU

+0

@MaxU數據本身是保密的,並給出它是座標這將很容易識別。然而,標題都是: 對於專賣店: storeKey(INT)\t locationLongitude \t locationLatitude \t COORDS(緯度,經度) 對於供應商: 供應商(VARCHAR)\t緯度\t經度\t COORDS(緯度,經度) – PaddyD15

+0

您不需要指定真實數據。只需[post](http://stackoverflow.com/posts/43435657/edit)示例(假)數據集在您的問題 – MaxU

回答

0

來源的DF

In [105]: sup 
Out[105]: 
       coords supplier 
0 (51.1235, -0.3453)  A 
1 (52.1245, -0.3423)  B 
2 (53.1235, -1.4553)  C 

In [106]: stores 
Out[106]: 
       coords storekey 
0 (54.1235, -0.6553)   1 
1 (49.1245, -1.3423)   2 
2 (50.1235, -1.8553)   3 

解決方案:

from sklearn.neighbors import DistanceMetric 
dist = DistanceMetric.get_metric('haversine') 

m = pd.merge(sup.assign(x=0), stores.assign(x=0), on='x', suffixes=['1','2']).drop('x',1) 

d1 = sup[['coords']].assign(lat=sup.coords.str[0], lon=sup.coords.str[1]).drop('coords',1) 
d2 = stores[['coords']].assign(lat=stores.coords.str[0], lon=stores.coords.str[1]).drop('coords',1) 

m['dist_km'] = np.ravel(dist.pairwise(np.radians(d1), np.radians(d2)) * 6367) 
## -- End pasted text -- 

結果:

In [135]: m 
Out[135]: 
       coords1 supplier    coords2 storekey  dist_km 
0 (51.1235, -0.3453)  A (54.1235, -0.6553)   1 334.029670 
1 (51.1235, -0.3453)  A (49.1245, -1.3423)   2 233.213416 
2 (51.1235, -0.3453)  A (50.1235, -1.8553)   3 153.880680 
3 (52.1245, -0.3423)  B (54.1235, -0.6553)   1 223.116901 
4 (52.1245, -0.3423)  B (49.1245, -1.3423)   2 340.738587 
5 (52.1245, -0.3423)  B (50.1235, -1.8553)   3 246.116984 
6 (53.1235, -1.4553)  C (54.1235, -0.6553)   1 122.997130 
7 (53.1235, -1.4553)  C (49.1245, -1.3423)   2 444.459052 
8 (53.1235, -1.4553)  C (50.1235, -1.8553)   3 334.514028