2017-02-10 147 views
2

我想合併兩個MultiIndex'ed數據幀。我的代碼如下。正如你在輸出中看到的那樣,問題是重複了「DATE」索引,而我希望所有的值(OPEN_INT,PX_LAST)都在同一個日期索引中......任何想法?我試過追加和concat,但都給了我類似的結果。Python熊貓 - 問題追加/ concat兩個多索引數據幀

  if df.empty: 
       df = bbg_historicaldata(t, f, startDate, endDate) 
       datesArray = list(df.index) 
       tArray = [t for i in range(len(datesArray))] 
       arrays = [tArray, datesArray] 
       tuples = list(zip(*arrays)) 
       index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])      
       df = pd.DataFrame({f : df[f].values}, index=index) 

      else: 
       temp = bbg_historicaldata(t,f,startDate,endDate) 
       datesArray = list(temp.index) 
       tArray = [t for i in range(len(datesArray))] 
       arrays = [tArray, datesArray] 
       tuples = list(zip(*arrays)) 
       index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE']) 


       temp = pd.DataFrame({f : temp[f].values}, index=index) 

       #df = df.append(temp, ignore_index = True) 
       df = pd.concat([df, temp]).sortlevel() 

而且結果:

     OPEN_INT PX_LAST 
TICKER  DATE       
EDH8 COMDTY 2017-02-01  NaN 98.365 
      2017-02-01 1008044.0  NaN 
      2017-02-02  NaN 98.370 
      2017-02-02 1009994.0  NaN 
      2017-02-03  NaN 98.360 
      2017-02-03 1019181.0  NaN 
      2017-02-06  NaN 98.405 
      2017-02-06 1023863.0  NaN 
      2017-02-07  NaN 98.410 
      2017-02-07 1024609.0  NaN 
      2017-02-08  NaN 98.435 
      2017-02-08 1046258.0  NaN 
      2017-02-09  NaN 98.395 

本質上想要得到它,所以沒有NaN的!

編輯:添加「軸= 1」到CONCAT導致以下(我的錯不包括在firstplace額外的輸出)

     PX_LAST OPEN_INT PX_LAST OPEN_INT PX_LAST \ 
TICKER  DATE               
EDH8 COMDTY 2017-02-01 98.365 1008044.0  NaN  NaN  NaN 
      2017-02-02 98.370 1009994.0  NaN  NaN  NaN 
      2017-02-03 98.360 1019181.0  NaN  NaN  NaN 
      2017-02-06 98.405 1023863.0  NaN  NaN  NaN 
      2017-02-07 98.410 1024609.0  NaN  NaN  NaN 
      2017-02-08 98.435 1046258.0  NaN  NaN  NaN 
      2017-02-09 98.395 1050291.0  NaN  NaN  NaN 
EDM8 COMDTY 2017-02-01  NaN  NaN 98.245 726739.0  NaN 
      2017-02-02  NaN  NaN 98.250 715081.0  NaN 
      2017-02-03  NaN  NaN 98.235 723936.0  NaN 
      2017-02-06  NaN  NaN 98.285 729324.0  NaN 
      2017-02-07  NaN  NaN 98.295 728673.0  NaN 
      2017-02-08  NaN  NaN 98.325 728520.0  NaN 
      2017-02-09  NaN  NaN 98.280 741840.0  NaN 
EDU8 COMDTY 2017-02-01  NaN  NaN  NaN  NaN 98.130 
      2017-02-02  NaN  NaN  NaN  NaN 98.135 
      2017-02-03  NaN  NaN  NaN  NaN 98.120 
      2017-02-06  NaN  NaN  NaN  NaN 98.180 
      2017-02-07  NaN  NaN  NaN  NaN 98.190 
      2017-02-08  NaN  NaN  NaN  NaN 98.225 
      2017-02-09  NaN  NaN  NaN  NaN 98.175 

謝謝!

回答

1

目前尚不清楚輸入格式是什麼。

我認爲OPEN_INT看起來是這樣的:

import datetime 
import pandas as pd 


open_int = pd.DataFrame(
    [ 
     (datetime.date(2017, 2, 1), 1008044.0), 
     (datetime.date(2017, 2, 2), 1009994.0), 
     (datetime.date(2017, 2, 3), 1019181.0), 
     (datetime.date(2017, 2, 6), 1023863.0), 
     (datetime.date(2017, 2, 7), 1024609.0), 
     (datetime.date(2017, 2, 8), 1046258.0), 
    ], 
    columns=['DATE', 'OPEN_INT'] 
) 
open_int['TICKER'] = 'EDH8 COMDTY' 
open_int.set_index(['TICKER', 'DATE'], inplace=True) 

print(open_int) 
#       OPEN_INT 
# TICKER  DATE 
# EDH8 COMDTY 2017-02-01 1008044.0 
#    2017-02-02 1009994.0 
#    2017-02-03 1019181.0 
#    2017-02-06 1023863.0 
#    2017-02-07 1024609.0 
#    2017-02-08 1046258.0 

而且PX_LAST看起來是這樣的:

px_last = pd.DataFrame(
    [ 
     (datetime.date(2017, 2, 1), 98.365), 
     (datetime.date(2017, 2, 2), 98.370), 
     (datetime.date(2017, 2, 3), 98.360), 
     (datetime.date(2017, 2, 6), 98.405), 
     (datetime.date(2017, 2, 7), 98.410), 
     (datetime.date(2017, 2, 8), 98.435), 
     (datetime.date(2017, 2, 9), 98.395), 

    ], 
    columns=['DATE', 'PX_LAST'] 
) 
px_last['TICKER'] = 'EDH8 COMDTY' 
px_last.set_index(['TICKER', 'DATE'], inplace=True) 

print(px_last) 
#       PX_LAST 
# TICKER  DATE 
# EDH8 COMDTY 2017-02-01 98.365 
#    2017-02-02 98.370 
#    2017-02-03 98.360 
#    2017-02-06 98.405 
#    2017-02-07 98.410 
#    2017-02-08 98.435 
#    2017-02-09 98.395 

然後你Concat的他們,並得到你想要的東西:

df = pd.concat([open_int, px_last], axis=1) 
print(df) 
#       OPEN_INT PX_LAST 
# TICKER  DATE 
# EDH8 COMDTY 2017-02-01 1008044.0 98.365 
#    2017-02-02 1009994.0 98.370 
#    2017-02-03 1019181.0 98.360 
#    2017-02-06 1023863.0 98.405 
#    2017-02-07 1024609.0 98.410 
#    2017-02-08 1046258.0 98.435 
#    2017-02-09  NaN 98.395 
+0

嗨 - 感謝您的回覆。這導致另一個問題不幸。編輯上面 – keynesiancross

1

你需要沿另一個軸串連:

pd.concat([df, temp], axis=1) 

默認情況下,熊貓串接行和列對齊,從而導致你看到的結果。

+0

嗨 - 感謝您的回覆。這導致另一個問題不幸。上面編輯 – keynesiancross