DataFrame組合

我正在處理一個包含多個索引的大型multiIndex數據框，例如segment,period和classification以及幾列結果例如Results1，Results2。數據幀consolidated_df應該存儲所有我的計算結果：DataFrame組合

import pandas as pd 
import numpy as np 

segments = ['A', 'B', 'C'] 
periods = [1, 2] 
classification = ['x', 'y'] 

index_constr = pd.MultiIndex.from_product(
    [segments, periods, classification], 
    names=['Segment', 'Period', 'Classification']) 

consolidated_df = pd.DataFrame(np.nan, index=index_constr, 
             columns=['Results1', 'Results2']) 

print(consolidated_df)

（中的大數據幀）的結構如下：

       Results1 Results2 
Segment Period Classification      
A  1  x     NaN  NaN 
       y     NaN  NaN 
     2  x     NaN  NaN 
       y     NaN  NaN 
B  1  x     NaN  NaN 
       y     NaN  NaN 
     2  x     NaN  NaN 
       y     NaN  NaN 
C  1  x     NaN  NaN 
       y     NaN  NaN 
     2  x     NaN  NaN 
       y     NaN  NaN

我運行一個循環對所有我的segments（ A，B和C）使用一個單獨的函數calc_function計算結果（它們被存儲在數據幀的列。

該函數返回一個與整合的DataFrame具有完全相同格式的DataFrame - 除了它一次只報告一個段（即，它是整合的DataFrame的一部分）。

例子：

index_result = pd.MultiIndex.from_product(
    [['A'], periods, classification], 
    names=['Segment', 'Period', 'Classification']) 

result_calc = pd.DataFrame(np.random.randn(4,2), index=index_result, 
    columns=['Results1', 'Results2']) 

print(result_calc) 

           Results1 Results2 
Segment Period Classification      
A  1  x    -1.568351 0.386250 
       y    0.679170 1.552551 
     2  x    -1.190928 -0.765319 
       y    3.254929 1.436295

我嘗試使用下面的方法來保存在合併一個結果數據框，但沒有成功：

for segment in segments: 
#calc_function returns a DataFrame that has the same structure as consolidated_df 
    consolidated_df.loc[idx[segment, :, :], :] = calc_function(segment)

有沒有一種方法能夠輕鬆整合整合到一個更小的DataFrame？

來源

2016-11-07 Andreas

是'calc_function'相同的所有行？如果是這樣，可能先計算它，然後將其合併到數據框中 – maxymoo

對於calc_function返回的所有DataFrame（它們是DataFrame的子集，它報告所有結果），這些行完全相同 – Andreas

我試圖在構造index_result時編輯你的例子，但是沒有足夠的字符：它應該讀取'[['A']，句點，分類]'（不是'['A'，句點，分類]'），因爲from_product使用列表。 –

以上例爲例，consolidated_df.ix['A'] = result_calc怎麼樣？

（這是一樣consolidated_df.ix['A', :, :] = result_calc）

print(consolidated_df) 

           Results1 Results2 
Segment Period Classification      
A  1  x    1.290466 0.228978 
       y    -0.276959 0.735192 
     2  x    0.757339 -0.787502 
       y    -0.609848 0.805773 
B  1  x     NaN  NaN 
       y     NaN  NaN 
     2  x     NaN  NaN 
       y     NaN  NaN 
C  1  x     NaN  NaN 
       y     NaN  NaN 
     2  x     NaN  NaN 
       y     NaN  NaN

來源

2016-11-07 23:25:22

回答

相關問題