2016-04-21 61 views
1

我試圖實現PCA,這對中間結果(如特徵值和特徵向量)運行良好。然而,當我嘗試將數據(3維)投影到二維主成分空間時,結果是錯誤的。 我花了很多時間我的代碼比較其他的實現,例如:Python PCA - 投影到較低空間空間

http://sebastianraschka.com/Articles/2014_pca_step_by_step.html

然而,很長一段時間後,就沒有進步,我不能發現其中的錯誤。由於正確的中間結果,我認爲這個問題是一個簡單的編碼錯誤。 預先感謝所有真正閱讀此問題的人,並且感謝那些提供有用評論/答案的人。

我的代碼如下:

import numpy as np 

class PCA(): 
def __init__(self, X):   
    #center the data   
    X = X - X.mean(axis=0)   
    #calculate covariance matrix based on X where data points are represented in rows 
    C = np.cov(X, rowvar=False)  
    #get eigenvectors and eigenvalues 
    d,u = np.linalg.eigh(C)   
    #sort both eigenvectors and eigenvalues descending regarding the eigenvalue 
    #the output of np.linalg.eigh is sorted ascending, therefore both are turned around to reach a descending order 
    self.U = np.asarray(u).T[::-1]  
    self.D = d[::-1] 

**problem starts here**  

def project(self, X, m): 
    #use the top m eigenvectors with the highest eigenvalues for the transformation matrix 
    Z = np.dot(X,np.asmatrix(self.U[:m]).T) 
    return Z 

我的代碼的結果是:

myresult 
([[ 0.03463706, -2.65447128], 
    [-1.52656731, 0.20025725], 
    [-3.82672364, 0.88865609], 
    [ 2.22969475, 0.05126909], 
    [-1.56296316, -2.22932369], 
    [ 1.59059825, 0.63988429], 
    [ 0.62786254, -0.61449831], 
    [ 0.59657118, 0.51004927]]) 

correct result - such as by sklearn.PCA 
([[ 0.26424835, -2.25344912], 
[-1.29695602, 0.60127941], 
[-3.59711235, 1.28967825], 
[ 2.45930604, 0.45229125], 
[-1.33335186, -1.82830153], 
[ 1.82020954, 1.04090645], 
[ 0.85747383, -0.21347615], 
[ 0.82618248, 0.91107143]]) 

The input is defined as follows: 
X = np.array([ 
[-2.133268233289599,0.903819474847349,2.217823388231679,-0.444779660856219,-0.661480010318842,-0.163814281248453,-0.608167714051449, 0.949391996219125], 
[-1.273486742804804,-1.270450725314960,-2.873297536940942, 1.819616794091556,-2.617784834189455, 1.706200163080549,0.196983250752276,0.501491995499840], 
[-0.935406638147949,0.298594472836292,1.520579082270122,-1.390457671168661,-1.180253547776717,-0.194988736923602,-0.645052874385757,-1.400566775105519]]).T 

回答

3

你需要減去意味着你其投影到新的基礎之前,中心數據:

mu = X.mean(0) 
C = np.cov(X - mu, rowvar=False) 
d, u = np.linalg.eigh(C) 
U = u.T[::-1] 
Z = np.dot(X - mu, U[:2].T) 

print(Z) 
# [[ 0.26424835 -2.25344912] 
# [-1.29695602 0.60127941] 
# [-3.59711235 1.28967825] 
# [ 2.45930604 0.45229125] 
# [-1.33335186 -1.82830153] 
# [ 1.82020954 1.04090645] 
# [ 0.85747383 -0.21347615] 
# [ 0.82618248 0.91107143]]