2016-11-07 44 views
1

我正在處理一個包含多個索引的大型multiIndex數據框,例如segment,periodclassification以及幾列結果例如Results1Results2。數據幀consolidated_df應該存儲所有我的計算結果:DataFrame組合

import pandas as pd 
import numpy as np 

segments = ['A', 'B', 'C'] 
periods = [1, 2] 
classification = ['x', 'y'] 

index_constr = pd.MultiIndex.from_product(
    [segments, periods, classification], 
    names=['Segment', 'Period', 'Classification']) 

consolidated_df = pd.DataFrame(np.nan, index=index_constr, 
             columns=['Results1', 'Results2']) 

print(consolidated_df) 

(中的大數據幀)的結構如下:

       Results1 Results2 
Segment Period Classification      
A  1  x     NaN  NaN 
       y     NaN  NaN 
     2  x     NaN  NaN 
       y     NaN  NaN 
B  1  x     NaN  NaN 
       y     NaN  NaN 
     2  x     NaN  NaN 
       y     NaN  NaN 
C  1  x     NaN  NaN 
       y     NaN  NaN 
     2  x     NaN  NaN 
       y     NaN  NaN 

我運行一個循環對所有我的segmentsABC)使用一個單獨的函數calc_function計算結果(它們被存儲在數據幀的列。

該函數返回一個與整合的DataFrame具有完全相同格式的DataFrame - 除了它一次只報告一個段(即,它是整合的DataFrame的一部分)。

例子:

index_result = pd.MultiIndex.from_product(
    [['A'], periods, classification], 
    names=['Segment', 'Period', 'Classification']) 

result_calc = pd.DataFrame(np.random.randn(4,2), index=index_result, 
    columns=['Results1', 'Results2']) 

print(result_calc) 

           Results1 Results2 
Segment Period Classification      
A  1  x    -1.568351 0.386250 
       y    0.679170 1.552551 
     2  x    -1.190928 -0.765319 
       y    3.254929 1.436295 

我嘗試使用下面的方法來保存在合併一個結果數據框,但沒有成功:

for segment in segments: 
#calc_function returns a DataFrame that has the same structure as consolidated_df 
    consolidated_df.loc[idx[segment, :, :], :] = calc_function(segment) 

有沒有一種方法能夠輕鬆整合整合到一個更小的DataFrame?

+0

是'calc_function'相同的所有行?如果是這樣,可能先計算它,然後將其合併到數據框中 – maxymoo

+0

對於calc_function返回的所有DataFrame(它們是DataFrame的子集,它報告所有結果),這些行完全相同 – Andreas

+0

我試圖在構造index_result時編輯你的例子,但是沒有足夠的字符:它應該讀取'[['A'],句點,分類]'(不是'['A',句點,分類]'),因爲from_product使用列表。 –

回答

1

以上例爲例,consolidated_df.ix['A'] = result_calc怎麼樣?

(這是一樣consolidated_df.ix['A', :, :] = result_calc

print(consolidated_df) 

           Results1 Results2 
Segment Period Classification      
A  1  x    1.290466 0.228978 
       y    -0.276959 0.735192 
     2  x    0.757339 -0.787502 
       y    -0.609848 0.805773 
B  1  x     NaN  NaN 
       y     NaN  NaN 
     2  x     NaN  NaN 
       y     NaN  NaN 
C  1  x     NaN  NaN 
       y     NaN  NaN 
     2  x     NaN  NaN 
       y     NaN  NaN