熊貓：類別dtype和過濾器

使用熊貓0.18.1，當過濾列dtype是category時，我意識到不同的行爲。這是一個簡單的例子。熊貓：類別dtype和過濾器

import pandas as pd 
import numpy as np 

l = np.random.randint(1, 4, 50) 
df = pd.DataFrame(dict(c_type=l, i_type=l)) 
df['c_type'] = df.c_type.astype('category') 

df.info() 

<class 'pandas.core.frame.DataFrame'> 
RangeIndex: 50 entries, 0 to 49 
Data columns (total 2 columns): 
c_type 50 non-null category 
i_type 50 non-null int64 
dtypes: category(1), int64(1) 
memory usage: 554.0 bytes

濾除整數類型的列的值中的一個導致

df[df.i_type.isin([1, 2])].i_type.value_counts() 

2 20 
1 17 
Name: i_type, dtype: int64

，但是，從分類類型列中的相同的過濾保持過濾作爲條目

df[df.c_type.isin([1, 2])].c_type.value_counts() 

2 20 
1 17 
3  0 
Name: c_type, dtype: int64

值雖然過濾器有效，但這種行爲對我來說似乎不尋常。例如，可以使用該過濾器來排除pivot_table函數中的未來列，該函數在處理category時需要額外的過濾器。

這是預期的行爲嗎？

來源

2017-02-16 Flavien Lambert

這是預期的行爲，如果檢查categorical docs：

系列方法，比如Series.value_counts（）將使用所有類別，即使某些類別中不存在的數據：

In [100]: s = pd.Series(pd.Categorical(["a","b","c","c"], categories=["c","a","b","d"])) 

In [101]: s.value_counts() 
Out[101]: 
c 2 
b 1 
a 1 
d 0 
dtype: int64

所以如果通過5（值不存在）獲得0爲每個類別：

print (df[df.c_type.isin([5])].c_type.value_counts()) 
3 0 
2 0 
1 0 
Name: c_type, dtype: int64

來源

2017-02-16 08:16:20 jezrael

我明白了。感謝您強調這一點。 –

熊貓：類別dtype和過濾器

回答

相關問題