2014-01-22 73 views
3

我分析數據幀和獲取,我想投入到具體的桶定時計數(0-10秒,10-30秒,等)蟒蛇大熊貓零個計數GET標籤。如何確保pd.cut

下面是一個簡單的例子:

import pandas as pd 

filter_values = [0, 10, 20, 30] # Bucket Values for pd.cut 

#Sample Times 
df1 = pd.DataFrame([1, 3, 8, 20], columns = ['filtercol']) 

#Use cut to get counts for each bucket 
out = pd.cut(df1.filtercol, bins = filter_values) 
counts = pd.value_counts(out) 
print counts 

上面打印:

(0, 10]  3 
(10, 20] 1 
dtype: int64 

你會發現它並沒有顯示(20,30]的任何值。這是因爲我的問題我可以使用下面的代碼來處理它:

bucket1=bucket2=bucket3=0 
if '(0, 10]' in counts: 
    bucket1=counts['(0, 10]'] 
if '(10, 20]' in counts: 
    bucket2=counts['(10, 30]'] 
if '(20, 30]' in counts: 
    bucket3=counts['(30, 60]'] 
print bucket1, bucket2, bucket3 

但是我想要impler更簡潔的方法,我可以使用:

print counts['(0, 10]'], counts['(10, 30]'], counts['(30, 60]'] 

理想的情況下,其中打印是基於filter_values的值,所以它們只在代碼一個地方。是的,我知道我可以改變使用filter_values [0]打印...

最後使用切割時有指定無窮大,因此最後一個桶比說60所有較大值的方法?

乾杯, 斯蒂芬

回答

1

您可以reindex由分類的層次:

In [11]: pd.value_counts(out).reindex(out.levels, fill_value=0) 
Out[11]: 
(0, 10]  3 
(10, 20] 1 
(20, 30] 0 
dtype: int64