2017-06-12 102 views
2

有兩個索引不同但具有匹配列的數據幀,我該如何計算它們之間的差異?熊貓計算兩個不同索引的數據幀

例如,

df1 = pd.DataFrame({ 'a': (188, 750, 1330, 1385, 188, 750, 1330, 1385), 
        'b': (51.12, 51.45, 74.49, 29.21, 39.98, 3.98, 14.46, 16.51), 
        'c': pd.Categorical(['R', 'R', 'R', 'R', 'F', 'F', 'F', 'F']) }) 
df1 = df1.set_index(['a']) 

      b c 
a    
188 51.12 R 
750 51.45 R 
1330 74.49 R 
1385 29.21 R 
188 39.98 F 
750 3.98 F 
1330 14.46 F 
1385 16.51 F 


df2 = pd.DataFrame({ 'x': (20, 50), 
        'c': pd.Categorical(['R', 'F']) }) 
df2 = df2.set_index(['c']) 

    x 
c  
R 20 
F 50 

我想利用b列的差在df1基於在df1cdf2匹配索引c病症的df2x

結果會是這樣的:

  b c diff 
a      
188 51.12 R 31.12 
750 51.45 R 31.45 
1330 74.49 R 54.49 
1385 29.21 R 9.21 
188 39.98 F -10.02 
750 3.98 F -46.02 
1330 14.46 F -35.54 
1385 16.51 F -33.49 

回答

2

您可以使用joinmap

df1['diff'] = df1['b'] - df1.join(df2, on='c')['x'] 
print (df1) 
      b c diff 
a      
188 51.12 R 31.12 
750 51.45 R 31.45 
1330 74.49 R 54.49 
1385 29.21 R 9.21 
188 39.98 F -10.02 
750 3.98 F -46.02 
1330 14.46 F -35.54 
1385 16.51 F -33.49 

或者:

df1['diff'] = df1['b'] - df1['c'].map(df2['x']) 
print (df1) 
      b c diff 
a      
188 51.12 R 31.12 
750 51.45 R 31.45 
1330 74.49 R 54.49 
1385 29.21 R 9.21 
188 39.98 F -10.02 
750 3.98 F -46.02 
1330 14.46 F -35.54 
1385 16.51 F -33.49 
+0

是將這些方法也有一系列的工作,例如,如果DF2是一個系列,而不是一個數據幀。將Series轉換爲DataFrame並提供列名很容易,但我要求這是一個進一步的說明。 – PedroA

+1

是的,這很容易。每列是「串聯」的,例如, 'df2 ['x']'是'Series' – jezrael

2
df1.assign(diff = df1['b'] - df1['c'].map(df2.squeeze())) 

輸出:

  b c diff 
a      
188 51.12 R 31.12 
750 51.45 R 31.45 
1330 74.49 R 54.49 
1385 29.21 R 9.21 
188 39.98 F -10.02 
750 3.98 F -46.02 
1330 14.46 F -35.54 
1385 16.51 F -33.49 
1
df1["diff"] = df1.apply(lambda x: x.b - df2.loc[x.c].values[0],axis=1)