2015-08-31 221 views
1

我有一系列的csv,我加載到數據框和存儲在列表(dataframesArray)。列表和dataframes看起來像如下:Python熊貓合併或concat數據幀

dataframesArray [    
    BBG.XAMS.UL.S_pnl_pos_cost 
     date         
     2015-03-23     0.000000 
     2015-03-24     0.000000 
     2015-03-25     -0.674717 
     2015-03-26     69.140999 
     2015-03-27     -70.128728,    
    BBG.XAMS.UNA.S_pnl_pos_cost 
     date         
     2015-03-23     -0.674929 
     2015-03-24     -15.138444 
     2015-03-25     90.830662 
     2015-03-26     21.446129 
     2015-03-27     -2.554376,    
    BBG.XAMS.UL.S_pnl_pos_cost 
     date         
     2014-10-20     -15.220730 
     2014-10-21     3031.610010 
     2014-10-22     1976.815412 
     2014-10-23    -2974.037294 
     2014-10-24     796.775000, 
    BBG.XAMS.UNA.S_pnl_pos_cost 
     date         
     2014-10-20     -4.140378 
     2014-10-21     618.064066 
     2014-10-22     -71.104800 
     2014-10-23     828.063647 
     2014-10-24      0.000000] 

的數據是2個產品(BBG.XAMS.UL.S_pnl_pos_cost和BBG.XAMS.UNA.S_pnl_pos_cost)按日期,在未來會有更多產品。我想Concat的或合併(不知道哪個)dataframes列表到一個數據幀(所謂的結果),因此它們看起來像:

  BBG.XAMS.UL.S_pnl_pos_cost BBG.XAMS.UNA.S_pnl_pos_cost date                 
2014-10-20     -15.220730      -4.140378 
2014-10-21    3031.610010     618.064066 
2014-10-22    1976.815412     -71.104800 
2014-10-23    -2974.037294     828.063647 
2014-10-24     796.775000      0.000000 
2015-03-23     0.000000     -0.674929 
2015-03-24     0.000000     -15.138444 
2015-03-25     -0.674717     90.830662 
2015-03-26     69.140999     21.446129 
2015-03-27     -70.128728     -2.554376 

我想用下面這樣做:

result = pd.concat(dataframesArray,axis=1) 

其中axis是日期。它看起來像數據按日期合併,但我錯過了2015-03-23開始的一週的數據。我現在的CONCAT結果數據框的樣子:

BBG.XAMS.UL.S_pnl_pos_cost BBG.XAMS.UNA.S_pnl_pos_cost 
date                 
2014-10-20     -15.220730     -4.140378 
2014-10-21     3031.610010     618.064066 
2014-10-22     1976.815412     -71.104800 
2014-10-23    -2974.037294     828.063647 
2014-10-24     796.775000      0.000000 
2015-03-23       NaN       NaN 
2015-03-24       NaN       NaN 
2015-03-25       NaN       NaN 
2015-03-26       NaN       NaN 
2015-03-27       NaN       NaN 

我目前的代碼是:

stockPricesDf=pd.read_csv(f,engine='c',header=0,index_col=0, parse_dates=True, infer_datetime_format=True,usecols=(0,3)) 

       stockPricesDf.rename(columns={'adjusted_last_acc': row},inplace=True)  

       dataframesArray.append(stockPricesDf) 

       result = pd.concat(dataframesArray,axis=1) 

我循環儘管一些目錄獲取存儲在CSV文件中的產品數據。

可能有人請讓我知道我做錯了,以及如何解決

非常感謝

+0

嘗試使用axis = 0。如果每個數據幀具有相同的列名,則這應該按列逐列進行連接。 – Maximus

+0

[Pandas join/merge/concat two dataframes]可能的重複(http://stackoverflow.com/questions/11637384/pandas-join-merge-concat-two-dataframes) –

回答

2

試試這個:

result = pd.concat(dataframesArray, axis=1) # like you did 
result = result.groupby(result.columns, axis=1).sum() 

如您所見,第一步做到這一點(編號):

    UL  UNA  UL  UNA 
2015-03-23 2.169534 0.294107  NaN  NaN 
2015-03-24 -0.077550 -0.758760  NaN  NaN 
2015-03-25 0.159659 -3.167541  NaN  NaN 
2015-03-26 0.895535 0.944644  NaN  NaN 
2015-03-27 -0.385408 -0.005069  NaN  NaN 
2015-10-20  NaN  NaN 1.855446 -0.229635 
2015-10-21  NaN  NaN -0.400450 -0.237323 
2015-10-22  NaN  NaN 1.103165 0.718134 
2015-10-23  NaN  NaN -0.157415 1.119828 
2015-10-24  NaN  NaN -0.016321 -0.371061 

第二步將分組名稱的列分組到單列:

    UL  UNA 
2015-03-23 2.169534 0.294107 
2015-03-24 -0.077550 -0.758760 
2015-03-25 0.159659 -3.167541 
2015-03-26 0.895535 0.944644 
2015-03-27 -0.385408 -0.005069 
2015-10-20 1.855446 -0.229635 
2015-10-21 -0.400450 -0.237323 
2015-10-22 1.103165 0.718134 
2015-10-23 -0.157415 1.119828 
2015-10-24 -0.016321 -0.371061 
+0

謝謝Ian,那個點擊 – Stacey