熊貓，獲得一個數據框的列中的單個值的計數

使用熊貓，我想獲得一列中的特定值的計數。我知道使用df.somecolumn.ravel（）會給我所有的獨特價值和他們的數量。但如何計算一些具體的價值。熊貓，獲得一個數據框的列中的單個值的計數

期望：

To get count of 1. 

    In[6]:df.somecalulation(1) 
    Out[6]: 5 

    To get count of 2. 

    In[6]:df.somecalulation(2) 
    Out[6]: 3

來源

2016-03-17 Randhawa

Are you optimi zing這對於多個查詢，或者一個小的（或單個）查詢？ –

單個小查詢。然後， – Randhawa

看到答案。 –

您可以嘗試value_counts：

df = df['col'].value_counts().reset_index() 
df.columns = ['col', 'count'] 
print df 
    col count 
0 1  5 
1 2  3

編輯：

print (df['col'] == 1).sum() 
5

或者：

def somecalulation(x): 
    return (df['col'] == x).sum() 

print somecalulation(1) 
5 
print somecalulation(2) 
3

或者：

ser = df['col'].value_counts() 

def somecalulation(s, x): 
    return s[x] 

print somecalulation(ser, 1) 
5 
print somecalulation(ser, 2) 
3

EDIT2：

如果你需要的東西非常快，使用numpy.in1d：

import pandas as pd 
import numpy as np 

a = pd.Series([1, 1, 1, 1, 2, 2]) 

#for testing len(a) = 6000 
a = pd.concat([a]*1000).reset_index(drop=True) 

print np.in1d(a,1).sum() 
4000 
print (a == 1).sum() 
4000 
print np.sum(a==1) 
4000

時序：

len(a)=6：

In [131]: %timeit np.in1d(a,1).sum() 
The slowest run took 9.17 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 29.9 µs per loop 

In [132]: %timeit np.sum(a == 1) 
10000 loops, best of 3: 196 µs per loop 

In [133]: %timeit (a == 1).sum() 
1000 loops, best of 3: 180 µs per loop

len(a)=6000：

In [135]: %timeit np.in1d(a,1).sum() 
The slowest run took 7.29 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 48.5 µs per loop 

In [136]: %timeit np.sum(a == 1) 
The slowest run took 5.23 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 273 µs per loop 

In [137]: %timeit (a == 1).sum() 
1000 loops, best of 3: 271 µs per loop

來源

2016-03-17 17:39:53 jezrael

對不起，有一個錯誤question.i已編輯it.Now看到它。 – Randhawa

如果你需要統計單個項目，'np.in1d'因爲接受解決方案更快。請參閱edit2和時間。謝謝。 – jezrael

如果你把value_counts回報，你可以查詢多個值：

import pandas as pd 

a = pd.Series([1, 1, 1, 1, 2, 2]) 
counts = a.value_counts() 
>>> counts[1], counts[2] 
(4, 2)

然而，只計算一個項目，這將是更快使用

import numpy as np 
np.sum(a == 1)

來源

2016-03-17 17:44:20

熊貓，獲得一個數據框的列中的單個值的計數

回答

相關問題