2017-11-25 179 views
2

我想從數據框(df)獲取列名並將它們關聯到由spearmanr相關函數生成的結果數組。我需要將列名(a-j)與相關值(spearman)和p值(spearman_pvalue)關聯起來。有沒有一種直觀的方式來執行此任務?python scipy spearman相關

from scipy.stats import pearsonr,spearmanr 
import numpy as np 
import pandas as pd 

df=pd.DataFrame(np.random.randint(0,100,size= (100,10)),columns=list('abcdefghij')) 

def binary(row): 
    if row>=50: 
     return 1 
    else: 
     return 0 
df['target']=df.a.apply(binary) 

spearman,spearman_pvalue=spearmanr(df.drop(['target'],axis=1),df.target) 
print(spearman) 
print(spearman_pvalue) 

回答

2

看來你需要:

from scipy.stats import spearmanr 

df=pd.DataFrame(np.random.randint(0,100,size= (100,10)),columns=list('abcdefghij')) 
#print (df) 

#faster for binary df 
df['target'] = (df['a'] >= 50).astype(int) 
#print (df) 

spearman,spearman_pvalue=spearmanr(df.drop(['target'],axis=1),df.target) 

df1 = pd.DataFrame(spearman.reshape(-1, 11), columns=df.columns) 
#print (df1) 

df2 = pd.DataFrame(spearman_pvalue.reshape(-1, 11), columns=df.columns) 
#print (df2) 

### Kyle, we can assign the index back to the column names for the total matrix: 
df2=df2.set_index(df.columns) 
df1=df1.set_index(df.columns) 

或者:

df1 = pd.DataFrame(spearman.reshape(-1, 11), 
        columns=df.columns, 
        index=df.columns) 
df2 = pd.DataFrame(spearman_pvalue.reshape(-1, 11), 
        columns=df.columns, 
        index=df.columns) 
+0

嗨Jezrael,我想實現這個回來的DF [ '目標']然而,它未能在重塑。你能調整代碼,使spearmanr如下:spearman,spearman_pvalue = spearmanr(df.drop(['target'],axis = 1),df.target)。我需要這個將統計數據與spearman corrlelation的二元目標關聯起來,否則我只會使用皮爾遜(離散vs連續)。 – Kyle

+0

哎呀,我忘了'target'專欄。現在它應該很好 – jezrael