2014-03-27 218 views
1

彙總行,我有以下行:在熊貓數據幀

ColumnID MenuID QuestionID ResponseCount  RowID SourceColumnID SourceRowID SourceVariationID 
22  -2  -2 319276487    28 3049400354  3049400356 3049400365   3049400365 
23  -2  -2 319276487    31 3049400354  3049400356 3049400365   3049400365 
24  -2  -2 319276487    37 3049400354  3049400356 3049400365   3049400365 
25  -2  -2 319276487    28 3049400353  3049400357 3049400365   3049400365 
26  -2  -2 319276487    45 3049400353  3049400357 3049400365   3049400365 
27  -2  -2 319276487    46 3049400353  3049400357 3049400365   3049400365 
28  -2  -2 319276487    26 3049400353  3049400358 3049400365   3049400365 
29  -2  -2 319276487    33 3049400353  3049400358 3049400365   3049400365 
30  -2  -2 319276487    39 3049400353  3049400358 3049400365   3049400365 
31  -2  -2 319276487    26 3049400353  3049400359 3049400365   3049400365 

而且我想,使其通過行ID和SourceVariationID總結了ResponseCount總壁球這個數據幀。

例如:

ColumnID MenuID QuestionID ResponseCount  RowID SourceColumnID SourceRowID SourceVariationID 
22  -2  -2 319276487    96 3049400354  3049400356 3049400365   3049400365 
23  -2  -2 319276487    243 3049400353  3049400356 3049400365 

這是我想出迄今:

(Pdb) new_df = df.groupby(['RowID', 'SourceVariationID', 'SourceRowID']).sum()                   
(Pdb) new_df['ColumnID'] = -2 
(Pdb) new_df['MenuID'] = -2 
(Pdb) pp new_df 
              ColumnID MenuID QuestionID ResponseCount SourceColumnID 
RowID  SourceVariationID SourceRowID                
3031434948 3031434943  3031434943   -2  -2 3805083612   141  36377219262 
      3031434945  3031434945   -2  -2 4439264214   237  42440089136 

[2 rows x 5 columns] 

回答

0

假設你的另一列整數:

columns = df.columns.tolist() 
columns.remove('ResponseCount') 
columns.remove('RowID') 
tempDf = df.groupby(['RowID'])[['ResponseCount']].sum() 
tempDf = tempDf.join(df.groupby(['RowID'])[columns].min()) 
tempDf['RowID'] = tempDf.index 

快速解決方案,不是一個偉大的! 希望這有助於。

2

你可以做類似如下:

print df 
    ColumnID MenuID QuestionID ResponseCount  RowID SourceVariationID 
0  -2  -2 319276487    28 3049400354   3049400365 
1  -2  -2 319276487    31 3049400354   3049400365 
2  -2  -2 319276487    37 3049400354   3049400365 
3  -2  -2 319276487    28 3049400353   3049400365 
4  -2  -2 319276487    45 3049400353   3049400365 
5  -2  -2 319276487    46 3049400353   3049400365 
6  -2  -2 319276487    26 3049400353   3049400365 
7  -2  -2 319276487    33 3049400353   3049400365 
8  -2  -2 319276487    39 3049400353   3049400365 
9  -2  -2 319276487    26 3049400353   3049400365 


def squash(group): 
    x = group.iloc[1,:].drop(['RowID','SourceVariationID']) 
    x['ResponseCount'] = group['ResponseCount'].sum() 
    return x 

print df.groupby(['RowID','SourceVariationID']).apply(squash) 

          ColumnID MenuID QuestionID ResponseCount 
RowID  SourceVariationID            
3049400353 3049400365    -2  -2 319276487   243 
3049400354 3049400365    -2  -2 319276487    96