2016-03-06 154 views
1

我用pd.DataFrame.corr()方法從我的DataFrame創建一個相關矩陣,做了一些東西,我切斷了某些值,得到類似於下面DF_interactions的表格。現在我想將它帶回到相關矩陣樣式中,如下面的DF_corr將垂直矩陣轉換爲相關矩陣。 Python

什麼是使用pandasnumpysklearn,或scipy對相互作用的錶轉換爲關係式的矩陣的最有效方法是什麼?

我包括我的填補這一數據幀的天真方法...

#Create table of interactions 
DF_interactions=pd.DataFrame([["A","B",0.1], 
           ["A","C",0.4], 
           ["B","C",0.3], 
           ["A","D",0.4]],columns=["var1","var2","corr"]) 
# var1 var2 corr 
# 0 A B 0.1 
# 1 A C 0.4 
# 2 B C 0.3 
# 3 A D 0.4 
n,m = DF_interactions.shape 
#4 3 
#Show which labels would be in correlation matrix for rows/columns 
nodes = set(DF_interactions["var1"]) | set(DF_interactions["var2"]) 
#set(['A', 'C', 'B', 'D']) 

#Create empty DataFrame to fill 
DF_corr = pd.DataFrame(np.zeros((len(nodes),len(nodes))), columns = sorted(nodes),index=sorted(nodes)) 
# A B C D 
# A 0 0 0 0 
# B 0 0 0 0 
# C 0 0 0 0 
# D 0 0 0 0 

#Naive way to fill it 
for i in range(n): 
    var1 = DF_interactions.iloc[i,0] 
    var2 = DF_interactions.iloc[i,1] 
    corr = DF_interactions.iloc[i,2] 
    DF_corr.loc[var1,var2] = corr 
    DF_corr.loc[var2,var1] = corr 
#  A B C D 
# A 0.0 0.1 0.4 0.4 
# B 0.1 0.0 0.3 0.0 
# C 0.4 0.3 0.0 0.0 
# D 0.4 0.0 0.0 0.0 

回答

1

假設你互動的表只包含一半的相關性(如果不確定,加上.drop_duplicates()):

corr = pd.concat([DF_interactions, DF_interactions.rename(columns={'var1': 'var2', 'var2': 'var1'})]) 

然後用.pivot()

corr = corr.pivot(index='var1', columns='var2', values='corr') 

var2 A B C D 
var1      
A  NaN 0.1 0.4 0.4 
B  0.1 NaN 0.3 NaN 
C  0.4 0.3 NaN NaN 
D  0.4 NaN NaN NaN 

如果您希望0值缺失交互,請使用.fillna(0)

+0

在.rename中發生了什麼(列= {'var1':'var2','var2':'var1'})我知道你在重命名,但爲什麼這是必要的步驟? –

+1

.rename和pd.concat一起幫助確保矩陣結果是對稱的。 – Stefan