2016-09-06 59 views
0

這是我如何創建我的多列的表格:設置多列作爲指數與指數名稱問題

whatFields = ['mean', 'mom_2', 'n'] 
groupbyFields = ['foo', 'bar'] 
topFields = ['desc']*len(groupbyFields) 
topFields += ['price']*len(whatFields) 
topFields += ['units']*len(whatFields) 
bottomFields = groupbyFields + whatFields + whatFields 
resultsDf = pd.DataFrame(columns=pd.MultiIndex.from_arrays([topFields, bottomFields])) 
indexFields = [('desc', field) for field in groupbyFields] 
resultsDf.set_index(indexFields, inplace=True) 

下面是空的結果:

Empty DataFrame 
Columns: [(price, mean), (price, mom_2), (price, n), (units, mean), (units, mom_2), (units, n)] 
Index: [] 

>>> resultsDf.index 
Out[2]: 
MultiIndex(levels=[[], []], 
      labels=[[], []], 
      names=[('desc', 'foo'), ('desc', 'bar')]) 

然而,填補了之後,它看起來像這個:

         price   units   
             mean mom_2 n mean mom_2 n 
(desc, foo) (desc, bar)         
1500002071 4292      NaN NaN NaN NaN NaN NaN 
      4246      NaN NaN NaN NaN NaN NaN 
      342      NaN NaN NaN NaN NaN NaN 
      104      NaN NaN NaN NaN NaN NaN 
      4218      2.59  0 1 NaN NaN NaN 

問題是索引字段有元組形式的這些奇怪的名字,而列有「正確的」名稱現在處於多列形狀。

您可能認爲這是因爲它們是索引。編號:

(desc, foo) (desc, bar) price   units   
             mean mom_2 n mean mom_2 n 
0 1500002071      4292 NaN NaN NaN NaN NaN NaN 
1 1500002071      4246 NaN NaN NaN NaN NaN NaN 
2 1500002071      342 NaN NaN NaN NaN NaN NaN 
3 1500002071      104 NaN NaN NaN NaN NaN NaN 
4 1500002071      4218 2.59  0 1 NaN NaN NaN 

爲什麼索引不遵循多佈局方面的列?最後,我想通過foobar(或真正的多索引,至少不是這個僞元組)訪問索引。

我怎麼能做到這一點?有沒有更好的方式來生成我的空df開始?

回答

0

這是你在找什麼?我不確定你想如何設置主索引。

兩種方式:

In [1]: import numpy as np 

In [2]: import pandas as pd 
i 
In [3]: import itertools as it 

In [4]: whatFields = ['mean', 'mom_2', 'n'] 
    ...: groupbyFields = ['foo', 'bar'] 
    ...: topFields= ['price', 'units'] 
    ...: descriptions = [11, 22, 33, 44] 
    ...: 
    ...: top_index = list(it.product(topFields, whatFields)) 
    ...: 
    ...: main_index = list(it.product(descriptions, groupbyFields)) 
    ...: main_index 
Out[4]: 
[(11, 'foo'), 
(11, 'bar'), 
(22, 'foo'), 
(22, 'bar'), 
(33, 'foo'), 
(33, 'bar'), 
(44, 'foo'), 
(44, 'bar')] 

In [5]: top_index 
Out[5]: 
[('price', 'mean'), 
('price', 'mom_2'), 
('price', 'n'), 
('units', 'mean'), 
('units', 'mom_2'), 
('units', 'n')] 

In [6]: resultsDf = pd.DataFrame(index=pd.MultiIndex.from_tuples(main_index) 
    ...:         .set_names(['desc', 'something']), 
    ...:       columns=pd.MultiIndex.from_tuples(top_index), 
    ...:       data=np.random.rand(len(main_index), len(top_index)) 
    ...:      ).sort_index() 

In [7]: resultsDf 
Out[7]: 
        price       units 
        mean  mom_2   n  mean  mom_2   n 
desc something 
11 bar  0.415331 0.153503 0.750690 0.505439 0.781057 0.102450 
    foo  0.444163 0.921779 0.587966 0.988859 0.747277 0.645065 
22 bar  0.205548 0.835086 0.630778 0.936277 0.587607 0.644636 
    foo  0.907772 0.927121 0.457286 0.881467 0.091484 0.217839 
33 bar  0.207454 0.670291 0.609697 0.024396 0.808362 0.738188 
    foo  0.838015 0.058354 0.804375 0.704137 0.760060 0.638933 
44 bar  0.577411 0.085774 0.394033 0.798052 0.107777 0.852888 
    foo  0.528873 0.902225 0.098982 0.611146 0.122890 0.887364 

或者:

In [10]: resultsDf = pd.DataFrame(columns=pd.MultiIndex.from_tuples(top_index), 
    ...:       data=np.random.rand(len(main_index), len(top_index))) 
    ...: 
    ...: resultsDf['desc'], resultsDf['something'] = zip(*main_index) 
    ...: 
    ...: 
    ...: resultsDf = resultsDf.set_index(['desc', 'something']).sort_index() 
    ...: 

In [11]: resultsDf 
Out[11]: 
        price       units 
        mean  mom_2   n  mean  mom_2   n 
desc something 
11 foo  0.205574 0.673159 0.772009 0.598809 0.070022 0.332420 
    bar  0.844376 0.602825 0.433186 0.420408 0.299380 0.354098 
22 foo  0.341226 0.489068 0.784226 0.721386 0.866248 0.113838 
    bar  0.729578 0.209731 0.533399 0.993587 0.340383 0.895143 
33 foo  0.629427 0.285344 0.634120 0.940294 0.378314 0.416081 
    bar  0.251746 0.022984 0.415058 0.322093 0.719954 0.251906 
44 foo  0.247829 0.085609 0.680114 0.760157 0.493465 0.659629 
    bar  0.667425 0.749589 0.578318 0.190334 0.131337 0.090083 

In [13]: resultsDf.loc[(22, "bar")] 
Out[13]: 
price mean  0.729578 
     mom_2 0.209731 
     n  0.533399 
units mean  0.993587 
     mom_2 0.340383 
     n  0.895143 
Name: (22, bar), dtype: float64 

In [14]: resultsDf.loc[(22, "bar"), "units"] 
Out[14]: 
mean  0.993587 
mom_2 0.340383 
n  0.895143 
Name: (22, bar), dtype: float64