2014-05-09 72 views
0

我嘗試用PCA R.我有以下數據:困難應用PCA

 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 
2454 0 168 290 45 1715 61 551 245 30 91 
222 188 94 105 60 3374 615 7 294 0 169 
552 0 0 465 0 3040 0 0 771 0 0 
2872 0 0 0 0 3380 0 289 0 0 0 
2938 0 56 56 0 2039 538 311 113 0 254 
2849 0 0 332 0 2548 0 332 0 0 221 
3102 0 0 0 0 2690 0 0 0 807 807 
3134 0 0 0 0 2897 289 144 144 144 0 
558 0 0 0 0 3453 0 0 0 0 0 
2893 0 262 175 0 2452 350 1138 262 87 175 
552 0 0 351 0 3114 0 0 678 0 0 
2874 0 109 54 0 2565 272 1037 109 0 0 
1396 0 0 407 0 1730 0 0 305 0 0 
2866 0 71 179 0 2403 358 753 35 107 143 
449 0 0 0 0 2825 0 0 0 0 0 
2888 0 0 523 0 2615 104 627 209 0 0 
2537 0 57 0 0 1854 0 0 463 0 0 
2873 0 0 342 0 3196 0 114 0 0 114 
720 0 0 365 4 2704 0 4 643 4 0 
218 125 31 94 219 2479 722 0 219 0 94 

到我申請了下面的代碼:

fit <- prcomp(data) 
ev <- fit$rotation # pc loadings 

爲了使一些測試,我試過看我檢索的數據矩陣時,我保留所有可以保留的組件:

numberComponentsKept = 10 
featureVector = ev[,1:numberComponentsKept] 
newData <- as.matrix(data)%*%as.matrix(featureVector) 

newData matrix應該是s但是我得到了一個非常不同的結果:

   PC1  PC2  PC3   PC4  PC5  PC6  PC7   PC8  PC9  PC10 
2454 1424.447 867.5986 514.0592 -155.4783720 -574.7425 85.38724 -86.71887 90.872507 4.305168 92.08284 
222 3139.681 1020.4150 376.3165 471.8718398 -796.9549 142.14301 -119.86945 32.919950 -31.269467 32.55846 
552 2851.544 539.6075 883.3969 -93.3579153 -908.6689 68.34030 -40.97052 -13.856931 23.133566 89.00851 
2872 3111.317 1210.0187 433.0382 -144.4065362 -381.2305 -20.08927 -49.03447 9.569258 44.201571 70.13113 
2938 1788.334 945.8162 189.6526 308.7703509 -593.5577 124.88484 -109.67276 -115.127348 14.170615 99.19492 
2849 2291.839 978.1819 374.7567 -243.6739292 -496.8707 287.01065 -126.22501 -18.747873 54.080763 62.80605 
3102 2530.989 814.7548 -510.5978 -410.6295894 -1015.3228 46.85727 -21.20662 14.696831 23.687923 72.37691 
3134 2679.430 970.1323 311.8627 124.2884480 -536.4490 -26.23858 83.86768 -17.808390 -28.802387 92.09583 
558 3268.599 988.2515 353.6538 -82.9155988 -342.5729 12.96219 -60.94886 18.537087 7.291126 96.14917 
2893 1921.761 1664.0084 631.0800 -55.6321469 -864.9628 -28.11045 -104.78931 37.797727 -12.078535 104.88374 
552 2927.108 607.6489 799.9602 -79.5494412 -827.6994 14.14625 -50.12209 -14.020936 29.996639 86.72887 
2874 2084.285 1636.7999 621.6383 -49.2934502 -577.4815 -67.27198 -11.06071 -7.167577 47.395309 51.02962 
1396 1618.171 337.4320 488.2717 -100.1663625 -469.8857 212.37199 -1.19409 13.531485 -23.332701 64.58806 
2866 2007.261 1387.6890 395.1586 0.8640971 -636.1243 133.41074 12.34794 -26.969634 5.506828 74.13767 
449 2674.136 808.5174 289.3345 -67.8356695 -280.2689 10.60475 -49.86404 15.165731 5.965083 78.66244 
2888 2254.171 1162.4988 749.7230 -206.0215007 -652.2364 302.36320 40.76341 -1.079259 17.635956 57.86999 
2537 1747.098 371.8884 429.1309 9.3761544 -480.7130 -196.25019 -81.31580 2.819608 24.089379 56.91885 
2873 2973.872 974.3854 433.7282 -197.0601947 -478.3647 301.96576 -81.81105 14.516646 -1.191972 100.79057 
720 2537.535 504.4124 744.5909 -78.1162036 -771.1396 38.17725 -36.61446 -9.079443 25.488688 78.21597 
218 2292.718 800.5257 260.6641 603.3295960 -641.9296 187.38913 11.71382 70.011487 78.047216 96.10967 

我做錯了什麼?

回答

3

我認爲這個問題相當於一個PCA問題,而不是一個R問題。您將原始data與旋轉矩陣相乘,然後您想知道爲什麼newData!=data。只有當旋轉矩陣是單位矩陣時,情況纔會如此。

什麼,你可能是打算做的是以下幾點:

# Run PCA: 
    fit <- prcomp(USArrests) 
    ev <- fit$rotation # pc loadings 

# Reversed PCA: 
    head(fit$x%*% t(as.matrix(ev))) 

# Centered Original data: 
    head(t(apply(USArrests,1,'-',colMeans(USArrests)))) 

在你要居中數據的最後一步,因爲函數prcomp默認它們居中。

+2

正如一個簡短的評論:使用'base'的'scale'函數來定位:'scale(USArrests,scale = FALSE,center = TRUE)'更容易。 – Beasterfield

+0

謝謝,我還不知道'scale'。看起來更容易閱讀。 –