2017-09-01 83 views
0

如何執行此操作?我有一個.csv文件下面的數據集:根據每個其他列的值合併大數據框

+------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+ 
    | Date | NBDG LN Equity | Date | P2P LN Equity | Date | HWSL LN Equity | Date | BPCR LN Equity | Date | AXI LN Equity | 
    +------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+ 
    | 09-08-2017 |   78,5 | 09-08-2017 |  877,061 | 09-08-2017 |  107,082 | 09-08-2017 |   1,0981 | 08-08-2017 |   94 | 
    | 08-08-2017 |   78,5 | 08-08-2017 |  878,7899 | 08-08-2017 |   106,5 | 08-08-2017 |   1,1021 | 07-08-2017 |   94 | 
    | 03-08-2017 |   78,5 | 07-08-2017 |  879,709 | 07-08-2017 |   106,2 | 07-08-2017 |   1,0945 | 02-08-2017 |  98,2472 | 
    | 01-08-2017 |   78,5 | 04-08-2017 |  879,6708 | 04-08-2017 |  105,4882 | 04-08-2017 |   1,0932 | 27-07-2017 |   98,5 | 
    +------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+ 

,我要「合併」成格式:

+------------+----------------+---------------+----------------+----------------+---------------+ 
| Date | NBDG LN Equity | P2P LN Equity | HWSL LN Equity | BPCR LN Equity | AXI LN Equity | 
+------------+----------------+---------------+----------------+----------------+---------------+ 
| 09-08-2017 | 78,5   | 877,061  | 107,082  | 1,0981   | NA   | 
| 08-08-2017 | 78,5   | 878,7899  | 106,5   | 1,1021   | 94   | 
| 07-08-2017 | NA    | 879,709  | 106,2   | 1,0945   | 94   | 
| 04-08-2017 | NA    | 879,6708  | 105,4882  | 1,0932   | NA   | 
| 03-08-2017 | 78,5   | NA   | NA    | NA    | NA   | 
| 02-08-2017 | NA    | NA   | NA    | NA    | 98,2472  | 
| 01-08-2017 | 78,5   | NA   | NA    | NA    | NA   | 
| 27-07-2017 | NA    | NA   | NA    | NA    | 98,5   | 
+------------+----------------+---------------+----------------+----------------+---------------+ 

我怎麼能做到這一點沒有硬編碼太多了?我開始用

dfData = local_csv('Data.csv', timezone='DK', sep=';') 
lDateColumns = [col for col in dfData.columns if 'Date' in col] 
dfData[dfData[lDateColumns].apply(pd.Series.nunique, axis=1)==1] 

,直到我注意到,有時指數相對於海誓山盟導致只有4行留下抵消唯一的行排序。

感謝

+0

到目前爲止您嘗試過什麼?請發佈您的代碼。 – James

回答

0

我崩潰了一塊數據框件(更準確地說,2列2列),然後合併一切重新走到一起:

In [103]: df 
Out[103]: 
     Date NBDG LN Equity  Date.1 P2P LN Equity  Date.2 \ 
0 09-08-2017   78,5 09-08-2017  877,061 09-08-2017 
1 08-08-2017   78,5 08-08-2017  878,7899 08-08-2017 
2 03-08-2017   78,5 07-08-2017  879,709 07-08-2017 
3 01-08-2017   78,5 04-08-2017  879,6708 04-08-2017 

    HWSL LN Equity  Date.3 BPCR LN Equity  Date.4 AXI LN Equity 
0  107,082 09-08-2017   1,0981 08-08-2017   94 
1   106,5 08-08-2017   1,1021 07-08-2017   94 
2   106,2 07-08-2017   1,0945 02-08-2017  98,2472 
3  105,4882 04-08-2017   1,0932 27-07-2017   98,5 

In [114]: res = [] 

In [115]: for i in range(5): 
    ...:  df_temp = pd.concat([df.iloc[:, 2*i], df.iloc[:, 2*i+1]], axis=1) 
    ...:  df_temp.columns = ['Date', df_temp.columns[1]] 
    ...:  res.append(df_temp) 
    ...:  

我們現在有數據幀的數組,其第一列始終是日期(並稱爲「日期」),第二列是相關度量。我們打算將所有東西合併使用functools.reduce

In [117]: from functools import reduce 

In [120]: reduce(lambda df1,df2: df1.merge(df2, on='Date', how='outer'), res) 
Out[120]: 
     Date NBDG LN Equity P2P LN Equity HWSL LN Equity BPCR LN Equity \ 
0 09-08-2017   78,5  877,061  107,082   1,0981 
1 08-08-2017   78,5  878,7899   106,5   1,1021 
2 03-08-2017   78,5   NaN   NaN   NaN 
3 01-08-2017   78,5   NaN   NaN   NaN 
4 07-08-2017   NaN  879,709   106,2   1,0945 
5 04-08-2017   NaN  879,6708  105,4882   1,0932 
6 02-08-2017   NaN   NaN   NaN   NaN 
7 27-07-2017   NaN   NaN   NaN   NaN 

    AXI LN Equity 
0   NaN 
1   94 
2   NaN 
3   NaN 
4   94 
5   NaN 
6  98,2472 
7   98,5 
相關問題