2017-04-09 76 views
1

我有一個dataFrame是一個觀察列表,按'name'列分組。我很難將其轉換爲multiIndex格式。如何將Pandas DataFrame轉換爲MultiIndexed形式的clustermap?

我有類似:

name | ratio | DayOfWeek | HourOfDay 
    foo | 0.7 | Mon  | 0 
    foo | 0.2 | Mon  | 1 
    foo | 0.11 | Mon  | 2 
    foo | 0.45 | Mon  | 3 
.. 
    foo | 0.2 | Mon  | 23 
    foo | 0.1 | Tue  | 0 
    foo | 0.6 | Tue  | 1 
    foo | 0.2 | Tue  | 2 
.. 
    foo | 0.1 | Sun  | 23 
    bar | 0.2 | Mon  | 0 
    bar | 0.11 | Mon  | 1 
.. 

等。

我想要的是我可以與seaborn clustermaps一起使用,以顯示每天(作爲整體)「名稱」的「比率」與天內特定小時之間的相關性。

例如我需要這樣的東西(不確定的,如果正確的,但是這是我嘗試過):

     | foo | bar | ... 
DayOfWeek HourOfDay | 
Mon  0   | 0.7 | 0.2 | ... 
      1   | ... 
      2   | ... 
... 
Tue  0   | 0.1 | ... 
      1   | ... 
...  2 

一旦我有,我希望能夠XS()成由seaborn熱圖/的ClusterMap可用的片。

回答

1

您可以使用set_indexunstack

df = df.set_index(['DayOfWeek','HourOfDay','name'])['ratio'].unstack() 
print (df) 
name     bar foo 
DayOfWeek HourOfDay    
Mon  0   0.20 0.70 
      1   0.11 0.20 
      2   NaN 0.11 
      3   NaN 0.45 
      23   NaN 0.20 
Sun  23   NaN 0.10 
Tue  0   NaN 0.10 
      1   NaN 0.60 
      2   NaN 0.20 

但如果需要重複使用pivot_tablemeansum一些骨料FUNC ...:

print (df) 
    name ratio DayOfWeek HourOfDay 
0 foo 0.70  Mon   0 <- duplicate for same name, DayOfWeek and HourOfDay - 0.7 
1 foo 0.90  Mon   0 <- duplicate for same name, DayOfWeek and HourOfDay - 0.9 
2 foo 0.20  Mon   1 
3 foo 0.11  Mon   2 
4 foo 0.45  Mon   3 
5 foo 0.20  Mon   23 
6 foo 0.10  Tue   0 
7 foo 0.60  Tue   1 
8 foo 0.20  Tue   2 
9 foo 0.10  Sun   23 
10 bar 0.20  Mon   0 
11 bar 0.11  Mon   1 


df = df.pivot_table(index=['DayOfWeek','HourOfDay'], 
        columns='name', 
        values='ratio', 
        aggfunc='mean') 
print (df) 

name     bar foo 
DayOfWeek HourOfDay    
Mon  0   0.20 0.80 < (0.7 + 0.9)/2 = 0.8 
      1   0.11 0.20 
      2   NaN 0.11 
      3   NaN 0.45 
      23   NaN 0.20 
Sun  23   NaN 0.10 
Tue  0   NaN 0.10 
      1   NaN 0.60 
      2   NaN 0.20 

替代與groupby

df = df.groupby(['DayOfWeek','HourOfDay','name'])['ratio'].mean().unstack() 
print (df) 
name     bar foo 
DayOfWeek HourOfDay    
Mon  0   0.20 0.80 < (0.7 + 0.9)/2 = 0.8 
      1   0.11 0.20 
      2   NaN 0.11 
      3   NaN 0.45 
      23   NaN 0.20 
Sun  23   NaN 0.10 
Tue  0   NaN 0.10 
      1   NaN 0.60 
      2   NaN 0.20 
相關問題