2016-11-26 67 views
1

我是Python新手,想重建此example。我有關於紐約市出租車出租車和出租車的經緯度數據,但是,我需要將數據更改爲Web Mercartor格式(這在以上示例中找不到)。 我發現這可能需要一對經度和緯度值並將其更改爲網絡Mercartor格式,這是從here拍攝功能,它看起來如下:將函數應用於Pandas Dataframe中的每一行

import math 
def toWGS84(xLon, yLat): 
    # Check if coordinate out of range for Latitude/Longitude 
    if (abs(xLon) < 180) and (abs(yLat) > 90): 
     return 

    # Check if coordinate out of range for Web Mercator 
    # 20037508.3427892 is full extent of Web Mercator 
    if (abs(xLon) > 20037508.3427892) or (abs(yLat) > 20037508.3427892): 
     return 

    semimajorAxis = 6378137.0 # WGS84 spheriod semimajor axis 

    latitude = (1.5707963267948966 - (2.0 * math.atan(math.exp((-1.0 * yLat)/semimajorAxis)))) * (180/math.pi) 
    longitude = ((xLon/semimajorAxis) * 57.295779513082323) - ((math.floor((((xLon/semimajorAxis) * 57.295779513082323) + 180.0)/360.0)) * 360.0) 

    return [longitude, latitude] 



def toWebMercator(xLon, yLat): 
    # Check if coordinate out of range for Latitude/Longitude 
    if (abs(xLon) > 180) and (abs(yLat) > 90): 
     return 

    semimajorAxis = 6378137.0 # WGS84 spheriod semimajor axis 
    east = xLon * 0.017453292519943295 
    north = yLat * 0.017453292519943295 

    northing = 3189068.5 * math.log((1.0 + math.sin(north))/(1.0 - math.sin(north))) 
    easting = semimajorAxis * east 

    return [easting, northing] 

def main(): 
    print(toWebMercator(-105.816001, 40.067633)) 
    print(toWGS84(-11779383.349100526, 4875775.395628653)) 

if __name__ == '__main__': 
    main() 

我怎麼這個數據應用到每對在我的熊貓數據框中的長/緯度座標,並保存在同一個pandasDF?

df.tail() 
      | longitude  | latitude 
____________|__________________|______________ 
11135465 | -73.986893 | 40.761093 
1113546  | -73.979645 | 40.747814 
11135467 | -74.001244 | 40.743172 
11135468 | -73.997818 | 40.726055 
... 

回答

1

隨着數據集的大小,有什麼可以幫助您最是瞭解如何做事情的方式pandas。與內置矢量化方法相比,遍歷行將產生可怕的性能。

import pandas as pd 
import numpy as np 

df = pd.read_csv('/yellow_tripdata_2016-06.csv') 
df.head(5) 

VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count trip_distance pickup_longitude pickup_latitude RatecodeID store_and_fwd_flag dropoff_longitude dropoff_latitude payment_type fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount 
0 2 2016-06-09 21:06:36 2016-06-09 21:13:08 2 0.79 -73.983360 40.760937 1 N -73.977463 40.753979 2 6.0 0.5 0.5 0.00 0.0 0.3 7.30 
1 2 2016-06-09 21:06:36 2016-06-09 21:35:11 1 5.22 -73.981720 40.736668 1 N -73.981636 40.670242 1 22.0 0.5 0.5 4.00 0.0 0.3 27.30 
2 2 2016-06-09 21:06:36 2016-06-09 21:13:10 1 1.26 -73.994316 40.751072 1 N -74.004234 40.742168 1 6.5 0.5 0.5 1.56 0.0 0.3 9.36 
3 2 2016-06-09 21:06:36 2016-06-09 21:36:10 1 7.39 -73.982361 40.773891 1 N -73.929466 40.851540 1 26.0 0.5 0.5 1.00 0.0 0.3 28.30 
4 2 2016-06-09 21:06:36 2016-06-09 21:23:23 1 3.10 -73.987106 40.733173 1 N -73.985909 40.766445 1 13.5 0.5 0.5 2.96 0.0 0.3 17.76 

該數據集有11,135,470行,它不是「大數據」,但不是很小。通過將函數的一部分執行到單個列,您可以獲得更多的性能,而不是編寫函數並將其應用於每一行。我將關閉此功能:

def toWebMercator(xLon, yLat): 
    # Check if coordinate out of range for Latitude/Longitude 
    if (abs(xLon) > 180) and (abs(yLat) > 90): 
     return 

    semimajorAxis = 6378137.0 # WGS84 spheriod semimajor axis 
    east = xLon * 0.017453292519943295 
    north = yLat * 0.017453292519943295 

    northing = 3189068.5 * math.log((1.0 + math.sin(north))/(1.0 - math.sin(north))) 
    easting = semimajorAxis * east 

    return [easting, northing] 

到這一點:

SEMIMAJORAXIS = 6378137.0 # typed in all caps since this is a static value 
df['pickup_east'] = df['pickup_longitude'] * 0.017453292519943295 # takes all pickup longitude values, multiples them, then saves as a new column named pickup_east. 
df['pickup_north'] = df['pickup_latitude'] * 0.017453292519943295 
# numpy functions allow you to calculate an entire column's worth of values by simply passing in the column. 
df['pickup_northing'] = 3189068.5 * np.log((1.0 + np.sin(df['pickup_north']))/(1.0 - np.sin(df['pickup_north']))) 
df['pickup_easting'] = SEMIMAJORAXIS * df['pickup_east'] 

你就必須pickup_eastingpickup_northing列與計算值。

我的筆記本電腦,這需要:

CPU times: user 1.01 s, sys: 286 ms, total: 1.3 s 
Wall time: 763 ms 

對於所有11米行。 15分鐘 - >秒。

我上你可以做類似的值 - 擺脫了檢查:

df = df[(df['pickup_longitude'].abs() <= 180) & (df['pickup_latitude'].abs() <= 90)] 

這使用布爾檢索,這又是數量級比循環更快。

+0

非常感謝,這確實非常快,非常有幫助。 – CFM

0

嘗試:

df[['longitude', 'latitude']].apply(
    lambda x: pd.Series(toWebMercator(*x), ['xLon', 'yLay']), 
    axis=1 
) 
+0

該數據幀有11 mio行,我跑了15分鐘,但它沒有產生輸出。但是,對我來說,潛在的方法現在更加清晰了,謝謝。 – CFM

0

如果你想保持一種可讀的數學函數,以及當前功能的簡單轉換,使用eval

df.eval(""" 
northing = 3189068.5 * log((1.0 + sin(latitude * 0.017453292519943295))/(1.0 - sin(latitude * 0.017453292519943295))) 
easting = 6378137.0 * longitude * 0.017453292519943295""", inplace=False) 
Out[51]: 
     id longitude latitude  northing  easting 
0 11135465 -73.986893 40.761093 4.977167e+06 -8.236183e+06 
1 1113546 -73.979645 40.747814 4.975215e+06 -8.235376e+06 
2 11135467 -74.001244 40.743172 4.974533e+06 -8.237781e+06 
3 11135468 -73.997818 40.726055 4.972018e+06 -8.237399e+06 

你將不得不工作,對語法一點,你不能使用if聲明,但在致電eval之前,您可以輕鬆地過濾超出邊界的數據。如果要直接分配新列,也可以使用inplace=True

如果你對保持數學語法不感興趣並且正在全速搜索,那麼numpy的答案可能會更快。

相關問題