2015-04-21 138 views
10

我想反轉從prcomp計算的PCA以恢復到原始數據。如何在prcomp中反轉PCA以獲得原始數據

我覺得像下面將工作:

pca$x %*% t(pca$rotation) 

但事實並非如此。

下面的鏈接顯示如何找回從個人電腦的原始數據,但解釋它僅適用於使用PCA對本徵的協方差矩陣 http://www.di.fc.ul.pt/~jpn/r/pca/pca.html

prcomp不calcluate電腦的方式。 「這個計算是通過(中心的和可能縮放的)數據矩陣的奇異值分解來完成的,而不是通過在協方差矩陣上使用特徵值來完成的。」 -prcomp所以你需要添加減去

+1

@konvas是正確的,但也可以告訴prcomp未按比例和中心:'PCA < - prcomp(數據,RETX = TRUE,中心= FALSE,標度= FALSE)'在這種情況下公式上面並工作。 – Mist

回答

11

prcomp將圍繞變量指背

t(t(pca$x %*% t(pca$rotation)) + pca$center) 

如果pca$scaleTRUE你還需要重新大規模

t(t(pca$x %*% t(pca$rotation)) * pca$scale + pca$center) 
+0

如何用PCA平滑數據?有沒有有效的方法? – Qbik

0

我希望這將幫助也是如此。

rm(list = ls()) 

# ---- 
# create a dataset feature of class 1, 100 samples 
f1 <- rnorm(n = 100, mean = 5, sd = 1) 

# ---- 
# still in the same feature, create class 2, also 100 samples 
f1 <- c(f1,rnorm(n = 100, mean = 10, sd = 1)) 

# ---- 
# create another feature, of course it has 200 samples 
f2 <- (f1 * 1.25) + rnorm(n = 200, mean = 7, sd = 0.75) 

# ---- 
# put them together in one container i.e dataset 
# feature #1 could better represent the separation of the two class 
# since it spread from about 4 to 11, while feature #2 spread from about 
# 6 to 8 (without addition 1.5 of feature #1) 
mydataset <- cbind(f1,f2) 

# ---- 
# create coloring label 
class.color <- c(rep(2,100),rep(3,100)) 

# ---- 
# plot the dataset 
plot(mydataset, col = class.color, main = 'the original formation') 

# ---- 
# transform it...!!!! 
pca.result <- prcomp(mydataset,scale. = TRUE, center = TRUE, retx = TRUE) 

# ---- 
# plot the samples on their new axis 
# recall that when a line was drawn at the zero value of PC 1, it could separate the red and green class 
# but not when it was drawn at the zero value of PC 2 
# the line at the zero of PC 1 put red on its left and green on its right (or vice versa) 
# the line at the zero of PC 2 put BOTH red AND green on its upper part, and ALSO BOTH red AND green on its 
# lower part... i.e. PC 2 could not separate the red and green class 
plot(pca.result$x, col = class.color, main = 'samples on their new axis') 

# ---- 
# calculate the variance explained by the PCs in percent 
# PC 1 could explain approximately 98% while PC 2 only 2% 
variance.total <- sum(pca.result$sdev^2) 
variance.explained <- pca.result$sdev^2/variance.total * 100 
print(variance.explained) 

# ---- 
# drop PC 2 ---> samples drawn at PC 1's axis ---> this is the desired new representation of dataset 
plot(x = pca.result$x[,1], y = rep(0,200), col = class.color, 
    main = 'over PC 1', ylab = '', xlab = 'PC 1') 

# ---- 
# drop PC 1 ---> samples drawn at PC 2's axis ---> this is the UNdesired new representation of dataset 
plot(x = pca.result$x[,2], y = rep(0,200), col = class.color, 
    main = 'over PC 2', ylab = '', xlab = 'PC 2') 

# ---- 
# now choose only PC 1 and get it back to the original dataset, let's see what it's like 
# take all PC 1 value, put it on first column of the new dataset, and zero pad the second column 
new.dataset <- cbind(
    pca.result$x[,1], 
    rep(0,200) 
) 

# ---- 
# take alook at a glance the new dataset 
# remember, although the choosen one was only PC 1, doesn't mean that there would be only one column 
# the second column (and all column for a larger feature) must also exist 
# but now they are all set to zero 
(new.dataset) 

# ---- 
# transform it back 
new.dataset <- new.dataset %*% solve(pca.result$rotation) 

# ---- 
# plot the new dataset that is constructed with only one PC 
# (a little clumsy though, for we already have a new better axis system, why would we use the old one?) 
plot(new.dataset,col = class.color, 
    main = 'centered and scaled\nnew dataset with only one pc ---> PC 1', xlab = 'f1', ylab = 'f2') 
# ---- 
# remember, the dots are stil in scale and center position 
# must be stretched and dragged first 
scalling.matrix <- matrix(rep(pca.result$scale,200),ncol = 2, byrow = TRUE) 
centering.matrix <- matrix(rep(pca.result$center,200),ncol = 2, byrow = TRUE) 

# ---- 
# obtain original values 
new.dataset <- (new.dataset * scalling.matrix) + centering.matrix 

# ---- 
# compare the result before and after centering 
# all dots reside the same position, but with different values 
plot(new.dataset,col = class.color, 
    main = 'stretched and dragged\nnew dataset with only one pc ---> PC 1', xlab = 'f1', ylab = 'f2') 

# ---- 
# what if all PCs were all used in construction the data? 
# they'll be forming back (but OF COURSE that's not the principal component analysis here on earth for) 
new.dataset <- cbind(
    pca.result$x[,1], 
    pca.result$x[,2] 
) 
new.dataset <- new.dataset %*% solve(pca.result$rotation) 
new.dataset <- (new.dataset * scalling.matrix) + centering.matrix 
plot(new.dataset,col = class.color, 
    main = 'new dataset with\nboth pc included ---> PC 1 & 2 present', xlab = 'f1', ylab = 'f2') 

# ---- 
# compare the inverted dots with those from the original formation, they're all the same 
相關問題