2016-12-06 75 views
0

我正在測量跨PCA空間和'特徵空間'的〜20種治療和3組的質心。如果我正確理解我的數學老師,他們之間的距離應該是相同的。然而,按照我計算的方式,他們不是,我想知道如果我做數學的方式,他們中的任何一個都是錯誤的。PCA空間和「特徵空間」分歧中的質心距離計算

我使用的是臭名昭著的葡萄酒的數據集作爲說明我的方法/ MWE:

library(ggbiplot) 
data(wine) 
treatments <- 1:2 #treatments to be considerd for this calculation 
wine.pca <- prcomp(wine[treatments], scale. = TRUE) 
#calculate the centroids for the feature/treatment space and the pca space 
df.wine.x <- as.data.frame(wine.pca$x) 
df.wine.x$groups <- wine.class 
wine$groups <- wine.class 
feature.centroids <- aggregate(wine[treatments], list(Type = wine$groups), mean) 
pca.centroids <- aggregate(df.wine.x[treatments], list(Type = df.wine.x$groups), mean) 
pca.centroids 
feature.centroids 
#calculate distance between the centroids of barolo and grignolino 
dist(rbind(feature.centroids[feature.centroids$Type == "barolo",][-1],feature.centroids[feature.centroids$Type == "grignolino",][-1]), method = "euclidean") 
dist(rbind(pca.centroids[pca.centroids$Type == "barolo",][-1],pca.centroids[pca.centroids$Type == "grignolino",][-1]), method = "euclidean") 

的最後兩行中的PCA空間內的功能空間和1.80717的距離返回1.468087,表明有美中不足...

回答

1

這是因爲縮放和居中,如果你不做縮放和居中的距離將原來和PCA特徵空間完全相同。

wine.pca <- prcomp(wine[treatments], scale=FALSE, center=FALSE) 

dist(rbind(feature.centroids[feature.centroids$Type == "barolo",][-1],feature.centroids[feature.centroids$Type == "grignolino",][-1]), method = "euclidean") 
#   1 
# 2 1.468087 
dist(rbind(pca.centroids[pca.centroids$Type == "barolo",][-1],pca.centroids[pca.centroids$Type == "grignolino",][-1]), method = "euclidean") 
#   1 
# 2 1.468087 

另一種方法是,以獲得相同的結果是縮放/中心的原始數據,然後用縮放應用PCA /定心類似如下:

wine[treatments] <- scale(wine[treatments], center = TRUE) 
wine.pca <- prcomp(wine[treatments], scale = TRUE) 

dist(rbind(feature.centroids[feature.centroids$Type == "barolo",][-1],feature.centroids[feature.centroids$Type == "grignolino",][-1]), method = "euclidean") 
#  1 
# 2 1.80717 
dist(rbind(pca.centroids[pca.centroids$Type == "barolo",][-1],pca.centroids[pca.centroids$Type == "grignolino",][-1]), method = "euclidean") 
#  1 
# 2 1.80717