我試圖通過一個csv文件,我轉換成一個熊貓數據框循環。通過熊貓數據幀循環並創建新的列值
我需要遍歷每一行並檢查我擁有的經度和緯度數據(2個單獨的列),並將一個代碼(0,1或2)添加到同一行,具體取決於lat數據是否落入在一定範圍內。
我對Python有點新,並且會喜歡你可能會有的任何幫助。
它在我身上扔掉了很多錯誤。
book = 'yellow_tripdata_2014-04.csv'
write_book = 'yellow_04.csv'
yank_max_long = -73.921630300
yank_min_long = -73.931169700
yank_max_lat = 40.832823000
yank_min_lat = 40.825582000
mets_max_long = 40.760523000
mets_min_long = 40.753277000
mets_max_lat = -73.841035400
mets_min_lat = -73.850564600
df = pd.read_csv(book)
##To check for Yankee Stadium Lat's and Long's, if within gps units then Stadium_Code = 1 , if mets then Stadium_Code=2
df['Stadium_Code'] = 0
for i, row in df.iterrows():
if yank_min_lat <= float(row['dropoff_latitude']) <= yank_max_lat and yank_min_long <=float(row('dropoff_longitude')) <=yank_max_long:
row['Stadium_Code'] == 1
elif mets_min_lat <= float(row['dropoff_latitude']) <= mets_max_lat and mets_min_long <=float(row('dropoff_longitude')) <=mets_max_long:
row['Stadium_Code'] == 2
我嘗試使用的.loc命令,但是遇到了這個錯誤信息:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-33-9a9166772646> in <module>()
----> 1 yank_mask = (df['dropoff_latitude'] > yank_min_lat) & (df['dropoff_latitude'] <= yank_max_lat) & (df['dropoff_longitude'] > yank_min_long) & (df['dropoff_longitude'] <= yank_max_long)
2
3 mets_mask = (df['dropoff_latitude'] > mets_min_lat) & (df['dropoff_latitude'] <= mets_max_lat) & (df['dropoff_longitude'] > mets_min_long) & (df['dropoff_longitude'] <= mets_max_long)
4
5 df.loc[yank_mask, 'Stadium_Code'] = 1
/Users/benjaminprice/anaconda/lib/python3.4/site-packages/pandas/core/frame.py in __getitem__(self, key)
1795 return self._getitem_multilevel(key)
1796 else:
-> 1797 return self._getitem_column(key)
1798
1799 def _getitem_column(self, key):
/Users/benjaminprice/anaconda/lib/python3.4/site-packages/pandas/core/frame.py in _getitem_column(self, key)
1802 # get column
1803 if self.columns.is_unique:
-> 1804 return self._get_item_cache(key)
1805
1806 # duplicate columns & possible reduce dimensionaility
/Users/benjaminprice/anaconda/lib/python3.4/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
1082 res = cache.get(item)
1083 if res is None:
-> 1084 values = self._data.get(item)
1085 res = self._box_item_values(item, values)
1086 cache[item] = res
/Users/benjaminprice/anaconda/lib/python3.4/site-packages/pandas/core/internals.py in get(self, item, fastpath)
2849
2850 if not isnull(item):
-> 2851 loc = self.items.get_loc(item)
2852 else:
2853 indexer = np.arange(len(self.items))[isnull(self.items)]
/Users/benjaminprice/anaconda/lib/python3.4/site-packages/pandas/core/index.py in get_loc(self, key, method)
1570 """
1571 if method is None:
-> 1572 return self._engine.get_loc(_values_from_object(key))
1573
1574 indexer = self.get_indexer([key], method=method)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)()
KeyError: 'dropoff_latitude'
我通常不搞清楚什麼這些錯誤代碼的意思是太糟糕了,但是這一次把我摔下。
一般來說,當你報告,你得到錯誤發佈錯誤跟蹤和它們出現的行是很有用的。 – EdChum
你的錯誤意味着你已經名不副實的列,可以從'df.columns.tolist()' – EdChum
[ 'VENDOR_ID', 'pickup_datetime', 'dropoff_datetime', 'passenger_count' 後輸出, 'trip_distance ' 'pickup_longitude', 'pickup_latitude', 'rate_code', 'store_and_fwd_flag', 'dropoff_longitude', 'dropoff_latitude', 'payment_type', 'fare_amount', '收費', ' mta_tax ', 'tip_amount', 'tolls_amount', 'total_amount', 'Stadium_Code'] –