2016-03-08 54 views
1

我有一個數據幀團,如:應用,並在熊貓蟒蛇

df: 

     cell   COMBINATION_ID  PREDICTION SYNERGY_SCORE 
0  BT-549   ADAM17.AKT 2.188390  7.398240 
1  CAL-148   ADAM17.AKT 10.030628  12.686340 
2  HCC38   ADAM17.AKT 9.217011  -4.351590 
3  DU-4475   ADAM17.FGFR -2.130943  -14.398730 
4  HCC1187   ADAM17.FGFR -1.103040  -6.400371 
5  HCC70   ADAM17.FGFR -2.076458  -14.909000 
6  Hs-578-T   ADAM17.FGFR 3.831822  -7.859544 

我想GROUPBY的COMBINATION_ID並同時獲得

結果會是這樣預測的相關性和SYNERGY_SCORE:

ADAM17.AKT cor([2.188390,10.030628,9.217011],[7.398240,12.686340,-4.351590] 
ADAM17.FGFR cor([-2.130943,-1.103040, -2.076458 ,3.831822],[-14.398730,-6.400371,-14.909000,-7.859544] 

我可以使用:

df2 = df.groupby('COMBINATION_ID').apply(f) 

但我不知道如何定義def f()

感謝

回答

0

考慮使用pandas' corr()與定義的功能,假設你有scipy包大熊貓安裝。您可以指定的方法:皮爾遜(默認),肯德爾斯皮爾曼

def f(row):  
    row['CORRELATION'] = row['PREDICTION'].corr(row['SYNERGY_SCORE'], method='spearman') 
    return row 

df2 = df.groupby('COMBINATION_ID').apply(f) 

您可以查看上面實際數字新列:

from scipy.stats.stats import spearmanr  

# ADAM17.AKT 
print(spearmanr([2.188390,10.030628,9.217011], 
       [7.398240,12.686340,-4.351590])) 
# ADAM17.FGFR 
print(spearmanr([-2.130943,-1.103040, -2.076458 ,3.831822], 
       [-14.398730,-6.400371,-14.909000,-7.859544]))