2017-03-22 172 views
0

列值我有以下的數據幀my_df大熊貓:過濾掉含空列表

col_A col_B 
--------------- 
John  [] 
Mary  ['A','B','C'] 
Ann  ['B','C'] 

我想刪除其中col_B具有一個空的列表的行。即我希望新的數據幀是:

col_A col_B 
--------------- 
Mary  ['A','B','C'] 
Ann  ['B','C'] 

下面是我所做的:

my_df[ len(my_df['col_B']) >0 ] 

,但我得到了以下錯誤:


KeyError         Traceback (most recent call last) 
/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance) 
    2133    try: 
-> 2134     return self._engine.get_loc(key) 
    2135    except KeyError: 

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)() 

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)() 

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)() 

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)() 

KeyError: True 

During handling of the above exception, another exception occurred: 

KeyError         Traceback (most recent call last) 
<ipython-input-27-75da0b0af6a1> in <module>() 
----> 1 records_df_pair_count[ len(records_df_pair_count['stable_seq']) >0 ] 

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in __getitem__(self, key) 
    2057    return self._getitem_multilevel(key) 
    2058   else: 
-> 2059    return self._getitem_column(key) 
    2060 
    2061  def _getitem_column(self, key): 

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _getitem_column(self, key) 
    2064   # get column 
    2065   if self.columns.is_unique: 
-> 2066    return self._get_item_cache(key) 
    2067 
    2068   # duplicate columns & possible reduce dimensionality 

/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py in _get_item_cache(self, item) 
    1384   res = cache.get(item) 
    1385   if res is None: 
-> 1386    values = self._data.get(item) 
    1387    res = self._box_item_values(item, values) 
    1388    cache[item] = res 

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in get(self, item, fastpath) 
    3539 
    3540    if not isnull(item): 
-> 3541     loc = self.items.get_loc(item) 
    3542    else: 
    3543     indexer = np.arange(len(self.items))[isnull(self.items)] 

/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance) 
    2134     return self._engine.get_loc(key) 
    2135    except KeyError: 
-> 2136     return self._engine.get_loc(self._maybe_cast_indexer(key)) 
    2137 
    2138   indexer = self.get_indexer([key], method=method, tolerance=tolerance) 

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)() 

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)() 

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)() 

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)() 

KeyError: True 

任何想法我在這裏做錯了?謝謝!

回答

1

另一種方式來做到這一點:

my_df[my_df['col_b'].apply(lambda x: len(x)) > 0] 
1

你已經得到了糾正問題一對夫婦的答案。但我想我會解釋你爲什麼不工作。

這給出了一個熊貓系列:

my_df['col_B'] 

所以這給了該系列的長度:

len(my_df['col_B']) 

既然你有一個非空系列,這個計算結果爲真:

len(my_df['col_B']) >0 

而這個:

my_df[ len(my_df['col_B']) >0 ] 

計算結果爲:

my_df[True] 

並明確my_df是不會有真正的列索引。因此KeyError。