2016-03-04 228 views
2

我創建一個數據幀方式如下:創建熊貓數據幀總行

filtered_data.groupby('weekday').agg({'airing': np.sum, 'uplift': [np.sum,np.mean]}) 

它創建的表:

sum  sum  mean 
weekday    
1 11 20 1.818182 
2 24 46 1.916667 
... 

我想是包括最後一行是每列的總數。

在此先感謝!

+2

你是否喜歡'df.append(pd.Series(name ='total',data = df.sum()))'? – EdChum

+1

@ mr-sk如果您改爲使用pd.pivot_table函數,則可以使用margin = True來獲取總計。這裏是文檔:http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html –

+0

大概你不想總結的意思... – Alexander

回答

1

可以使用.loc功能,以實現這一目標:

df.loc[len(df)] = [df[col].sum() for col in df.columns] 
1

在這種情況下,你應該建立一個跟蹤您彙總統計的系列。如果您需要出於顯示目的,則可以連續進行連接。

summary = pd.Series([filtered_data.airing.sum(), 
        filtered_data.uplift.sum(), 
        filtered_data.uplift.mean()], 
        name='summary') 
0

爲此,我創建了一個聚合工具,其行爲類似於SQL中的GROUPING SETS。提供用於分組和聚合函數的列,並獲取聚合的DataFrame。

import itertools as it 
import pandas as pd 

from pandas.util.testing import assert_frame_equal 

def powerset(iterable): 
    "powerset([1,2,3]) -->() (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)" 
    s = list(iterable) 
    return it.chain.from_iterable(it.combinations(s,r) for r in range(len(s)+1)) 

def grouper(df,grpby,aggfunc): 
    ''' produces aggregate DataFrame from DataFrames for non-redundant groupings 
     `workingdf` is used to avoid modifying original DataFrame 
    ''' 
    uniqcols = set(col for col in grpby if len(df[col].unique()) == 1) 
    subset = set() 
    for col in uniqcols: 
     for grp in powerset(grpby): 
      if col in grp: 
       subset.add(grp) # add level of aggregation only when non-redundant 

    if len(subset) == 0: 
     for grp in powerset(grpby): 
      subset.add(grp) 

    workingdf = df.copy() 
    for idx,i in enumerate(subset): 
     if i !=(): 
      tmp = aggfunc(workingdf.groupby(i)) 
     else: 
      # hack to get output to be a DataFrameGroupBy object: 
      # insert dummy column on which to group by 
      dummycolname = hash(tuple(workingdf.columns.tolist())) 
      workingdf[dummycolname] = '' 
      tmp = aggfunc(workingdf.groupby(dummycolname)) 

     # drop the index and add it back 
     if i ==(): tmp.reset_index(drop=True,inplace=True) 
     else: tmp.reset_index(inplace=True) 

     for j in grpby: 
      if j not in tmp: # if column is not in DataFrame add it 
       tmp[j] = '(All)' 

     # new list with all columns including aggregate ones; do this only once 
     if idx == 0: 
      finalcols = grpby[:] 
      addlcols = [k for k in tmp if k not in grpby] # aggregate columns 
      finalcols.extend(addlcols) 

     # reorder columns 
     tmp = tmp[finalcols] 

     if idx == 0: 
      final = tmp; del tmp 
     else: 
      final = pd.concat([final,tmp]); del tmp 

    del workingdf 

    final.sort_values(finalcols,inplace=True) 
    final.reset_index(drop=True,inplace=True) 

    return final 

def agg(grpbyobj): 
    ''' the purpose of this function is to: 
      specify aggregate operation(s) you wish to perform, 
      name the resulting column(s) in the final DataFrame. 
    ''' 
    tmp = pd.DataFrame() 
    tmp['Total (n)'] = grpbyobj['Total'].sum() 
    return tmp 

if __name__ == '__main__': 
    df = pd.DataFrame({'Area':['a','a','b',], 
         'Year':[2014,2014,2014,], 
         'Month':[1,2,3,], 
         'Total':[4,5,6,],}) 
    final = grouper(df,grpby=['Area','Year'],aggfunc=agg) 

    # test against expected result 
    expected = pd.DataFrame({u'Year': {0: 2014, 1: 2014, 2: 2014}, 
          u'Total (n)': {0: 15, 1: 9, 2: 6}, 
          u'Area': {0: u'(All)', 1: u'a', 2: u'b'}}) 
    expected = expected[final.columns.tolist()] 
    try: 
     # check_names kwarg True: compare indexes and columns 
     assert_frame_equal(final,expected,check_names=True) 
    except AssertionError as e: 
     raise