2015-10-05 15 views
0

我的表看起來像這樣:我應該如何減去兩個數據框和Pandas並顯示需要的輸出?

In [82]:df.head() 
Out[82]: 
      MatDoc MatYr MvT Material Plnt SLoc  Batch Customer AmountLC Amount ... PO MatYr.1 MatDoc.1 Order ProfitCtr SLED/BBD PstngDate EntryDate  Time Username 
    0 4912693062 2015 551 100062 HDC2 0001 5G30MC1A11  NaN  9.03 9.06 ... NaN  NaN  NaN NaN IN1165B085 26.01.2016 01.08.2015 01.08.2015 01:13:16 O33462 
    1 4912693063 2015 501  166 HDC2 0004   NaN  NaN  0.00 0.00 ... NaN  NaN  NaN NaN IN1165B085   NaN 01.08.2015 01.08.2015 01:13:17 O33462 
    2 4912693320 2015 551 101343 HDC2 0001 5G28MC1A11  NaN  53.73 53.72 ... NaN  NaN  NaN NaN IN1165B085 25.01.2016 01.08.2015 01.08.2015 01:16:30 O33462 

在這裏,我需要通過組數據上Order列,僅總和AmountLC column.Then我需要檢查的Order列的值,例如,它應該是存在於兩個MvT101groupMvT102group。如果Order匹配兩組數據,那麼我需要從MvT101group減去MvT102group。和顯示

Order|Plnt|Material|Batch|Sum101=SumofMvt101ofAmountLC|Sum102=SumofMvt102ofAmountLC|(Sum101-Sum102)/100 

我所做的是我首先提出只含101和102新DF:Mvt101MvT102

MvT101 = df.loc[df['MvT'] == 101]

MvT102 = df.loc[df['MvT'] == 102]

然後,我通過Order分組,並得到列的總和值

MvT101group = MvT101.groupby('Order', sort=True)

In [76]: 
MvT101group[['AmountLC']].sum() 
Out[76]: 
Order   AmountLC 
1127828  16348566.88 
1127829  22237710.38 
1127830  29803745.65 
1127831  30621381.06 
1127832  33926352.51 

MvT102group = MvT102.groupby('Order', sort=True)

In [77]: 
MvT102group[['AmountLC']].sum() 
Out[77]: 
Order   AmountLC 
1127830  53221.70 
1127831  651475.13 
1127834  67442.16 
1127835  2477494.17 
1128622  218743.14 

在此之後,我無法理解我應該怎麼寫我的查詢。 如果你願意,請問我任何進一步的細節。這裏是我工作的CSV文件Link

回答

0

希望我正確地理解了這個問題。分組兩組之後,你做的事:

MvT101group = MvT101.groupby('Order',sort=True).sum() 
MvT102group = MvT102.groupby('Order',sort=True).sum() 

您可以更新列的名稱爲兩組:

MvT101group.columns = MvT101group.columns.map(lambda x: str(x) + '_101') 
MvT102group.columns = MvT102group.columns.map(lambda x: str(x) + '_102') 

然後合併所有3個表,這樣你將不得不在主表中的所有3列:

df = df.merge(MvT101group, left_on=['Order'], right_index=True, how='left') 
df = df.merge(MvT102group, left_on=['Order'], right_index=True, how='left') 

然後你就可以添加計算列:

df['calc'] = (df['Order_101']-df['Order_102'])/100 
相關問題