2015-05-22 56 views
1

下面數據幀是原始數據幀組合兩個大熊貓與多級索引

Week_No item_Number  Inside__Outside 
4 1.2014 3164018114707537 INSIDE 
6 1.2014 50010EJ654990  INSIDE 
19 1.2014 304400JE130142  INSIDE 
29 1.2014 3164018114725810 INSIDE 
31 1.2014 3164018114711298 INSIDE 
35 1.2014 3164018114707546 OUTSIDE 
36 1.2014 3164018114711299 OUTSIDE 
41 1.2014 3164018114727381 INSIDE 
54 1.2014 50010EJ655470  OUTSIDE 
145 1.2014 304400TS135379  INSIDE 

此我分組這樣

df = df.groupby(['Week_No','Inside__Outside']).agg(['count']) 

後在這之後,組合數據幀

      item_Number 
           count 
Week_No Inside__Outside 
1.2014   INSIDE   51 
       OUTSIDE   8 
2.2014   INSIDE   91 
       OUTSIDE   16 
3.2014   INSIDE   92 
       OUTSIDE   7 
4.2014   INSIDE   76 
       OUTSIDE   5 

現在有兩個數據幀

df1         
          item_Number 
           count 
Week_No  Inside__Outside  
1.2015  INSIDE    18 
2.2015  INSIDE    48 
3.2015  INSIDE    87 
4.2015  INSIDE    54 
5.2015  INSIDE    61 
6.2015  INSIDE    46 
7.2015  INSIDE    83 
8.2015  INSIDE    41 
9.2015  INSIDE    34 

而且

df2         
           item_Number 
            count 
    Week_No  Inside__Outside  
    1.2015  OUTSIDE     8 
    2.2015  OUTSIDE     4 
    3.2015  OUTSIDE     7 
    4.2015  OUTSIDE     4 
    5.2015  OUTSIDE     1 
    6.2015  OUTSIDE     6 
    7.2015  OUTSIDE     8 
    8.2015  OUTSIDE     4 
    9.2015  OUTSIDE     3 

現在我想根據每週總結。即,兩個數據幀

Week_No  total 
    1.2015  18 
    2.2015  48 
    3.2015  87 
    4.2015  54 
    5.2015  61 
    6.2015  46 
    7.2015  83 
    8.2015  41 
    9.2015  34 

的輸出我認爲第一選擇數據,然後將手動添加它們,但似乎並不是有效的。此外,由於這是多級索引,我無法根據Week_no選擇數據。也請不要看數列中的絕對數字。我的問題是針對多級索引數據框的操作。

+0

你能後的代碼和原始輸入數據重現你的dfs,你的df也不是那麼有價值的,因爲無論如何你總共只有1個值,你可以做'df.groupby(level = 0).sum()' – EdChum

+0

Hi @EdChum ,我已經添加了代碼,原始數據框以及輸出。請忽略列中的絕對值,因爲這只是一個示例。我想知道如何使用多級索引對熊貓數據框進行操作。我也加了它。 –

+0

你可以試試'df1.add(df2,level = 0)' – EdChum

回答

0

僅僅通過第一級追加他們一起組他們 -

In [118]: df1 
Out[118]: 
         item_Number 
           count 
Week_No Inside__Outside    
1.2015 INSIDE     18 
2.2015 INSIDE     48 
3.2015 INSIDE     87 
4.2015 INSIDE     54 
5.2015 INSIDE     61 
6.2015 INSIDE     46 
7.2015 INSIDE     83 
8.2015 INSIDE     41 
9.2015 INSIDE     34 

In [119]: df2 
Out[119]: 
         item_Number 
           count 
Week_No Inside__Outside    
1.2015 OUTSIDE     8 
2.2015 OUTSIDE     4 
3.2015 OUTSIDE     7 
4.2015 OUTSIDE     4 
5.2015 OUTSIDE     1 
6.2015 OUTSIDE     6 
7.2015 OUTSIDE     8 
8.2015 OUTSIDE     4 
9.2015 OUTSIDE     3 

In [120]: df1.append(df2).groupby(level=0).sum() 
Out[120]: 
     item_Number 
       count 
Week_No    
1.2015   26 
2.2015   52 
3.2015   94 
4.2015   58 
5.2015   62 
6.2015   52 
7.2015   91 
8.2015   45 
9.2015   37 
0

您必須從您的索引中刪除Inside__Outside列,因爲您沒有使用它來加入這兩個表。


讓我們開始與兩個dataframes你在你的例子給:

data_1_df 
Out[35]: 
         item_Number count 
Week_No Inside__Outside     
1.2015 INSIDE       18 
2.2015 INSIDE       48 
3.2015 INSIDE       87 
4.2015 INSIDE       54 
5.2015 INSIDE       61 
6.2015 INSIDE       46 
7.2015 INSIDE       83 
8.2015 INSIDE       41 
9.2015 INSIDE       34 

data_2_df 
Out[36]: 
         item_Number count 
Week_No Inside__Outside     
1.2015 OUTSIDE       8 
2.2015 OUTSIDE       4 
3.2015 OUTSIDE       7 
4.2015 OUTSIDE       4 
5.2015 OUTSIDE       1 
6.2015 OUTSIDE       6 
7.2015 OUTSIDE       8 
8.2015 OUTSIDE       4 
9.2015 OUTSIDE       3 

你可以堆疊起來一個,另一方面,集團的頂部上Week_No和總計item_Number count

data_3_df = (
    pd.concat([data_1_df, data_2_df]) 
    .reset_index() 
    .groupby('Week_No') 
    .agg({'item_Number count': sum} 
) 

這給出了每星期的總和,對於INSIDEOUTSIDE

data_3_df 
Out[52]: 
     item_Number count 
Week_No     
1.2015     26 
2.2015     52 
3.2015     94 
4.2015     58 
5.2015     62 
6.2015     52 
7.2015     91 
8.2015     45 
9.2015     37