Pandas Groupby有多列選擇全行值的行

我正在使用一個熊貓數據框。從代碼：Pandas Groupby有多列選擇全行值的行

我有一個熊貓groupby對象與兩個組層：狀態和年。

State/Year/$ 
NY  2009 5 
     2010 10 
     2011 5 
     2012 15 
NJ  2009 2 
     2012 12 
DE  2009 1 
     2010 2 
     2011 3 
     2012 6

我只想看看那些我有多年的數據（即NY和DE，而不是NJ，因爲它缺少2010）。有沒有一種方法可以抑制那些嵌套組少於滿秩的嵌套組？

來源

2015-08-13 Johannes Wachs

通過State和Year分組和取均值後，

means = contracts.groupby(['State', 'Year'])['$'].mean()

你可以GROUPBY的State獨自一人，並使用filter保持所需的組：

result = means.groupby(level='State').filter(lambda x: len(x)>=len(years))

例如，

import numpy as np 
import pandas as pd 
np.random.seed(2015) 
N = 15 

states = ['NY','NJ','DE'] 
years = range(2009, 2013) 
contracts = pd.DataFrame({ 
    'State': np.random.choice(states, size=N), 
    'Year': np.random.choice(years, size=N), 
    '$': np.random.randint(10, size=N)}) 

means = contracts.groupby(['State', 'Year'])['$'].mean() 
result = means.groupby(level='State').filter(lambda x: len(x)>=len(years)) 

print(result)

個

產量

State Year 
DE  2009 8 
     2010 5 
     2011 3 
     2012 6 
NY  2009 2 
     2010 1 
     2011 5 
     2012 9 
Name: $, dtype: int64

或者，你可以篩選，然後再取均值：

filtered = contracts.groupby(['State']).filter(lambda x: x['Year'].nunique() >= len(years)) 
result = filtered.groupby(['State', 'Year'])['$'].mean()

，但通過不同的例子打表明這通常比取均值，然後過濾慢。

來源

2015-08-13 17:48:03 unutbu

Pandas Groupby有多列選擇全行值的行

回答

相關問題