2013-07-31 27 views
2

這裏有一個簡單的數據幀:pandas dataframe aggregate - 爲什麼它會返回列名稱?

Acid Balance_1 CustID Balance_2 
0  1 0.082627  1  NaN 
1  2 0.397579  1 0.459942 
2  3 0.201596  2 0.596573 
3  4 0.616448  3 0.705697 
4  5 0.844865  3 0.483279 
5  6  NaN  4 0.360260 

我一直在試圖玩弄聚合函數,通過客戶ID分組後。

groupby_obj = time_series.groupby(["CustID"]) 
df = groupeby_obj.agg(set) 

這將返回

           Acid \ 
CustID            
1  set([Balance_1, Balance_2, Acid, CustID]) 
2  set([Balance_1, Balance_2, Acid, CustID]) 
3  set([Balance_1, Balance_2, Acid, CustID]) 
4  set([Balance_1, Balance_2, Acid, CustID]) 

             Balance_1 \ 
CustID            
1  set([Balance_1, Balance_2, Acid, CustID]) 
2  set([Balance_1, Balance_2, Acid, CustID]) 
3  set([Balance_1, Balance_2, Acid, CustID]) 
4  set([Balance_1, Balance_2, Acid, CustID]) 

             Balance_2 
CustID            
1  set([Balance_1, Balance_2, Acid, CustID]) 
2  set([Balance_1, Balance_2, Acid, CustID]) 
3  set([Balance_1, Balance_2, Acid, CustID]) 
4  set([Balance_1, Balance_2, Acid, CustID]) 

什麼,而不是我想這可能會做:

 Acid   Balance_1     Balance_2 
CustID        
1  set([1,2]) set([0.082627, 0.397579]) set([NaN, 0.459942]) 
    etc for the other CustIDs... 

爲什麼總填充數據幀與集合中的所有列標題?

感謝, 安妮

回答

1

這是你的框架

In [29]: df 
Out[29]: 
    Acid Balance_1 CustID Balance_2 
0  1 0.082627  1  NaN 
1  2 0.397579  1 0.459942 
2  3 0.201596  2 0.596573 
3  4 0.616448  3 0.705697 
4  5 0.844865  3 0.483279 
5  6  NaN  4 0.360260 

下面是您創建

In [24]: df.groupby(['CustID']).groups 
Out[24]: {1: [0, 1], 2: [2], 3: [3, 4], 4: [5]} 

這裏有一個方法來看看什麼東西被傳遞給函數(其框架的分組)

In [25]: df.iloc[[0,1]] 
Out[25]: 
    Acid Balance_1 CustID Balance_2 
0  1 0.082627  1  NaN 
1  2 0.397579  1 0.459942 

In [26]: df.iloc[[2]] 
Out[26]: 
    Acid Balance_1 CustID Balance_2 
2  3 0.201596  2 0.596573 

而且這裏是一組操作做一個框架(你找回列的列表) 它不是一個很有趣/有用的操作

In [27]: set(df.iloc[[2]]) 
Out[27]: set(['Balance_1', 'Balance_2', 'Acid', 'CustID']) 

AGG的一點是要聚集傳遞框架說一個系列。你的操作 應減少投入dimensionaility

In [28]: df.groupby(['CustID']).agg(lambda x: x.sum()) 
Out[28]: 
     Acid Balance_1 Balance_2 
CustID        
1   3 0.480206 0.459942 
2   3 0.201596 0.596573 
3   9 1.461313 1.188976 
4   6  NaN 0.360260 

你是你要實現什麼目標?

+0

謝謝傑夫!我並不是真的想要完成任何事情,只是爲了提高我對大熊貓工作方式的理解而努力...... – Anne

相關問題