2016-12-26 57 views
0

試圖查詢數據幀Python的大熊貓:查詢失敗KeyError異常的一列了很多

In [6]: books.dtypes 
Out[6]: 
count    float64 
product    int64 
channel    int64 
book_start_year  int64 
book_start_week  int64 
book_end_year  int64 
book_end_week  int64 
period   float64 
dtype: object 

In [8]: print(books.columns.tolist()) 
['count', 'product', 'channel', 'book_start_year', 'book_start_week', 'book_end_year', 'book_end_week', 'period'] 

有:

books[books.channel == 1] 

工作正常,但是這一個:

books[books.product == 1] 

失敗KeyError(請參閱下文)。數據幀根據前面大熊貓書面只需一分鐘前在MacOS下使用命令csv文件閱讀:

books = pd.read_csv('boxes2.csv', header=0)  

復位或設置索引的另一列也沒有幫助。有任何想法嗎?

更新

我怎麼那麼應該寫這樣的查詢:

data = books[(books.start_year >= start_year) 
       & (books.start_week >= start_week) 
       & (books.end_year <= end_year) 
       & (books.end_week <= end_week) 
       & (books.product == product) 
       ] 

或者我可以不?

錯誤:

In [5]: books[books.product == 1] 
    --------------------------------------------------------------------------- 
    KeyError         Traceback (most recent call last) 
    <ipython-input-5-c6883f7202ed> in <module>() 
    ----> 1 books[books.product == 1] 

    /Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key) 
     2002   # get column 
     2003   if self.columns.is_unique: 
    -> 2004    return self._get_item_cache(key) 
     2005 
     2006   # duplicate columns & possible reduce dimensionality 

    /Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item\ 
) 
     1348   res = cache.get(item) 
     1349   if res is None: 
    -> 1350    values = self._data.get(item) 
     1351    res = self._book_item_values(item, values) 
     1352    cache[item] = res 

    /Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath\ 
) 
     3288 
     3289    if not isnull(item): 
    -> 3290     loc = self.items.get_loc(item) 
     3291    else: 
     3292     indexer = np.arange(len(self.items))[isnull(self.items)] 

    /Users/user/usr/anaconda_2.7/anaconda/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method,\ 
tolerance) 
     1945     return self._engine.get_loc(key) 
     1946    except KeyError: 
    -> 1947     return self._engine.get_loc(self._maybe_cast_indexer(key)) 
     1948 
     1949   indexer = self.get_indexer([key], method=method, tolerance=tolerance) 

    pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)() 

    pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)() 

    pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)() 

    pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)() 

    KeyError: False 

回答

1

product是你需要使用引號來訪問你的專欄,因爲方法擡頭第一列名之前,訪問列作爲屬性是一種便捷的方法,但它容易出錯,所以你應該用方括號:

books[books['product'] == 1] 

每個人都應該想到dataframes爲一體的Series一個dict太像一個正常的字典,你可以通過一個Key返回一個Value在這種情況下將是列或系列。

注意,IPython中顯示product如下:

Signature: df.product(axis=None, skipna=None, level=None, numeric_only=None, **kwargs) 
Docstring: 
Return the product of the values for the requested axis 

Parameters 
---------- 
axis : {index (0), columns (1)} 
skipna : boolean, default True 
    Exclude NA/null values. If an entire row/column is NA, the result 
    will be NA 
level : int or level name, default None 
    If the axis is a MultiIndex (hierarchical), count along a 
    particular level, collapsing into a Series 
numeric_only : boolean, default None 
    Include only float, int, boolean columns. If None, will attempt to use 
    everything, then use only numeric data. Not implemented for Series. 

Returns 
------- 
prod : Series or DataFrame (if level specified) 
File:  c:\winpython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\pandas\core\generic.py 
Type:  method 

所以這不是證明,但它一樣prod

這也強烈建議你停止訪問列的屬性,因爲它導致奇怪的錯誤,養成使用[]來訪問列的習慣,以避免將來出現這種情況

編輯

回答您的更新問題,使用[]訪問所有列:

data = books[(books['start_year'] >= start_year) 
       & (books['start_week'] >= start_week) 
       & (books['end_year'] <= end_year) 
       & (books['end_week'] <= end_week) 
       & (books['product'] == product) 
       ] 

儘管從技術上說,你只需要爲產品列做到這一點,你應該養成這樣所有列

的習慣
+0

請看我更新的問題 – zork

+1

查看更新的答案 – EdChum

+1

這樣的答案值得高調投票 – piRSquared