通過State
和Year
分組和取均值後,
means = contracts.groupby(['State', 'Year'])['$'].mean()
你可以GROUPBY的State
獨自一人,並使用filter
保持所需的組:
result = means.groupby(level='State').filter(lambda x: len(x)>=len(years))
例如,
import numpy as np
import pandas as pd
np.random.seed(2015)
N = 15
states = ['NY','NJ','DE']
years = range(2009, 2013)
contracts = pd.DataFrame({
'State': np.random.choice(states, size=N),
'Year': np.random.choice(years, size=N),
'$': np.random.randint(10, size=N)})
means = contracts.groupby(['State', 'Year'])['$'].mean()
result = means.groupby(level='State').filter(lambda x: len(x)>=len(years))
print(result)
個
產量
State Year
DE 2009 8
2010 5
2011 3
2012 6
NY 2009 2
2010 1
2011 5
2012 9
Name: $, dtype: int64
或者,你可以篩選,然後再取均值:
filtered = contracts.groupby(['State']).filter(lambda x: x['Year'].nunique() >= len(years))
result = filtered.groupby(['State', 'Year'])['$'].mean()
,但通過不同的例子打表明這通常比取均值,然後過濾慢。