2017-09-25 16 views
1

https://www.kaggle.com/anokas/time-travel-eda大熊貓:GROUPBY( 'date_x') '成果']的意思是()

什麼是這些代碼意味着什麼? groupby('date_x')['outcome'].mean(),我無法在sklearn文檔中找到它。

date_x['Class probability'] = df_train.groupby('date_x')['outcome'].mean() 
date_x['Frequency'] = df_train.groupby('date_x')['outcome'].size() 
date_x.plot(secondary_y='Frequency',figsize=(22, 10)) 

謝謝!

+0

你可以找到它在'pandas'文檔。關於分組的Pandas教程可能會有所幫助。 https://pandas.pydata.org/pandas-docs/stable/groupby.html –

回答

1

我覺得更好的是使用DataFrameGroupBy.agg的總量,除以size團體的長度和mean每它們由date_x列分組組:

d = {'mean':'Class probability','size':'Frequency'} 
df = df_train.groupby('date_x')['outcome'].agg(['mean','size']).rename(columns=d) 

df.plot(secondary_y='Frequency',figsize=(22, 10)) 

有關更多信息,請applying multiple functions at once

樣品:

d = {'date_x':pd.to_datetime(['2015-01-01','2015-01-01','2015-01-01', 
           '2015-01-02','2015-01-02']), 
    'outcome':[20,30,40,50,60]} 
df_train = pd.DataFrame(d) 
print (df_train) 
     date_x outcome 
0 2015-01-01  20 ->1.group 
1 2015-01-01  30 ->1.group 
2 2015-01-01  40 ->1.group 
3 2015-01-02  50 ->2.group 
4 2015-01-02  60 ->2.group 

d = {'mean':'Class probability','size':'Frequency'} 
df = df_train.groupby('date_x')['outcome'].agg(['mean','size']).rename(columns=d) 
print (df) 
      Class probability Frequency 
date_x         
2015-01-01     30   3 
2015-01-02     55   2 
+0

當然!最後一件事情請檢查..最後爲什麼班級概率是30/55,是否應該是40/60? –

+0

不是,因爲有2組 - 前3行的日期相同,2015-01-01,平均值爲(20 + 30 + 40)/ 3 = 30,最後2行的日期爲2015-01 -02'與'(50 + 60)/ 2 = 55' – jezrael