我有以下代碼:大熊貓:錯誤時迴路在給定的大熊貓行
df_boundry = df_in.dropna().quantile([0.0, .8])
for row in df_in.iterrows():
for column in row:
if row[column] > df_boundry[column][0.8]:
row[column] = df_boundry[column][0.8]
基本上,每一個給定的行(記錄),我們檢查每個列的值。如果該值超過80百分位,我們將其替換爲80-百分值。但是我在上面的代碼中的錯誤:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-67-81b2be77cc8a> in <module>()
4 for row in df_in.iterrows():
5 for column in row:
----> 6 if row[column] > df_boundry[column][0.8]:
7 row[column] = df_boundry[column][0.8]
8
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
1995 return self._getitem_multilevel(key)
1996 else:
-> 1997 return self._getitem_column(key)
1998
1999 def _getitem_column(self, key):
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
2002 # get column
2003 if self.columns.is_unique:
-> 2004 return self._get_item_cache(key)
2005
2006 # duplicate columns & possible reduce dimensionality
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
1348 res = cache.get(item)
1349 if res is None:
-> 1350 values = self._data.get(item)
1351 res = self._box_item_values(item, values)
1352 cache[item] = res
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
3288
3289 if not isnull(item):
-> 3290 loc = self.items.get_loc(item)
3291 else:
3292 indexer = np.arange(len(self.items))[isnull(self.items)]
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
1945 return self._engine.get_loc(key)
1946 except KeyError:
-> 1947 return self._engine.get_loc(self._maybe_cast_indexer(key))
1948
1949 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)()
KeyError: 0
這裏是df_in一些示例數據:
column_A | column_B | column_C
--------------------------------
0.5 | 0.5 | NaN
1.2 | NaN | NaN
NaN | 8.1 | 21.1
9.1 | 9.3 | 2.1
4.5 | 90.1 | 1.4
112.3 | 79.2 | 1.1
:
:
和df_boundry:
| column_A | column_B | column_C
----------------------------------------
0.0 | 0.1 | 0.4 | 0.0
0.8 | 110.4 | 80.1 | 20.5
爲樣本數據應該是預期的成果
column_A | column_B | column_C
--------------------------------
0.5 | 0.5 | NaN
1.2 | NaN | NaN
NaN | 8.1 | 20.5
9.1 | 9.3 | 2.1
4.5 | 80.1 | 1.4
110.4 | 79.2 | 1.1
:
:
即只有當單元格值> df_boundry [column] [0.8]時,我們用df_boundry [column] [0.8]代替它。
有沒有人知道我在這裏錯過了什麼?謝謝!
你能發佈一個樣本數據集(5-7行)嗎? – MaxU
只要你明白錯誤,df_in.iterrows()就會返回一個(index,row)的元組。你可以通過在df_in.iterrows()中執行'idx,row'來解決這個問題,但即使在你這樣做之後,row也是一個系列,所以'for行中的列'實際上是返回行中的每個值。嘗試在循環中打印一些變量以進一步探索它。 – shawnheide