2016-09-20 34 views
0

我誤解了一些東西。這是用我的代碼sklearn vs numpy的PCA是不同的

sklearn

import numpy as np 
import matplotlib.pyplot as plt 
from mpl_toolkits.mplot3d import Axes3D 
from sklearn import decomposition 
from sklearn import datasets 
from sklearn.preprocessing import StandardScaler 

pca = decomposition.PCA(n_components=3) 

x = np.array([ 
     [0.387,4878, 5.42], 
     [0.723,12104,5.25], 
     [1,12756,5.52], 
     [1.524,6787,3.94], 
    ]) 
pca.fit_transform(x) 

輸出:

array([[ -4.25324997e+03, -8.41288672e-01, -8.37858943e-03], 
    [ 2.97275001e+03, -1.25977271e-01, 1.82476780e-01], 
    [ 3.62475003e+03, -1.56843494e-01, -1.65224286e-01], 
    [ -2.34425007e+03, 1.12410944e+00, -8.87390454e-03]]) 

使用numpy的方法

x_std = StandardScaler().fit_transform(x) 
cov = np.cov(X.T) 
ev , eig = np.linalg.eig(cov) 
a = eig.dot(x_std.T) 

輸出

array([[ 1.38252552, -1.25240764, 0.2133338 ], 
     [-0.53279935, -0.44541231, -0.77988021], 
     [-0.45230635, 0.21983192, -1.23796328], 
     [-0.39741982, 1.47798804, 1.80450969]]) 

我一直都3個組成部分,但它似乎並沒有讓我保留我的原始數據。

我可以知道這是爲什麼嗎?

回答

2

請勿使用StandardScaler。相反,只是減去每列的均值從x

In [92]: xm = x - x.mean(axis=0) 

In [93]: cov = np.cov(xm.T) 

In [94]: evals, evecs = np.linalg.eig(cov) 

In [95]: xm.dot(evecs) 
Out[95]: 
array([[ -4.2532e+03, -8.3786e-03, -8.4129e-01], 
     [ 2.9728e+03, 1.8248e-01, -1.2598e-01], 
     [ 3.6248e+03, -1.6522e-01, -1.5684e-01], 
     [ -2.3443e+03, -8.8739e-03, 1.1241e+00]]) 

那最後的結果包含了相同的信息sklearn結果,但列的順序是不同的。