2017-03-02 62 views
1

我有以下格式的數據幀df如何更改數據幀的格式?

df = 
MONTH WEEKDAY EVAL 
1  0   1 
1  0   0 
1  0   0 
1  1   1 
1  1   0 
2  0   0 
2  0   0 
2  1   1 

我對數據進行分組如下:

result = df.groupby(['MONTH','WEEKDAY','EVAL']).size().reset_index() 
result 

在其輸出結果的方式是我想要的東西不同,以得到:

MONTH WEEKDAY EVAL 0 
1  0  0  400 
1  0  1  20 
1  1  0  300 
1  1  1  20 
2  0  0  200 
2  0  1  35 
2  1  0  450 
2  1  1  26 

我想要的result的格式更改爲這一個:

WEEKDAY EVAL_0 EVAL_1 
0   400  20 
0   200  35 
1   300  20 
1   450  26 

我該怎麼辦?

回答

1

我想你需要通過unstack重塑,那麼一些數據清洗是必要的:

df = df.set_index(['MONTH','WEEKDAY','EVAL'])['0'].unstack() 

#if get ValueError: Index contains duplicate entries, cannot reshape 
#if duplicates and necessary aggregate data with mean, sum... 
#df = df.groupby(['MONTH','WEEKDAY','EVAL'])['0'].mean().unstack() 
#df = df.pivot_table(index=['MONTH','WEEKDAY'], columns='EVAL', values='0', aggfunc='mean') 

print (df) 
EVAL    0 1 
MONTH WEEKDAY   
1  0  400 20 
     1  300 20 
2  0  200 35 
     1  450 26 

df = df.sort_index(level=[1,0]) 
     .reset_index(level=0, drop=True) 
     .add_prefix('EVAL_') 
     .reset_index() 
     .rename_axis(None, axis=1) 
print (df) 
    WEEKDAY EVAL_0 EVAL_1 
0  0  400  20 
1  0  200  35 
2  1  300  20 
3  1  450  26 

樣品與重複:

print (df) 
    MONTH WEEKDAY EVAL 0 
0  1  0  0 400 
1  1  0  1 20 
2  1  1  0 300 
3  1  1  1 20 
4  2  0  0 200 
5  2  0  1 35 
6  2  1  0 450 
7  2  1  1 26 
8  2  1  1 100 <-duplicate 

df = df.groupby(['MONTH','WEEKDAY','EVAL'])['0'].mean().unstack() 

df = df.sort_index(level=[1,0]) 
     .reset_index(level=0, drop=True) 
     .add_prefix('EVAL_') 
     .reset_index() 
     .rename_axis(None, axis=1) 
print (df) 
    WEEKDAY EVAL_0 EVAL_1 
0  0  400  20 
1  0  200  35 
2  1  300  20 
3  1  450  63 <- value is mean of (100 + 26)/2 
+0

它的工作對我來說,當我改變了'[0]。將'unstack()'分配給'[0] .unstack()'。謝謝。 – Dinosaurius

+0

我忘記了,對不起。您可以通過'result = df.groupby(['MONTH','WEEKDAY','EVAL'])size()。reset_index(name ='COL')''將'0'更改爲相同的值。謝謝。 – jezrael