2014-09-21 157 views
1

我有一個由R集羣情節,而我想優化與wss情節聚類的「肘標準」,但我不知道如何繪製給定集羣的wss情節,任何人都可以幫助我?如何繪製羣集內羣集平方和的圖形?

這裏是我的數據:

Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096) 
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1) 
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029) 
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067) 
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188) 
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108) 
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046) 
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088) 
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025) 
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1) 
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146) 
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442) 
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717) 

data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral) 

這裏是我的羣集的代碼:

cor <- cor (data) 
dist<-dist(cor) 
hclust<-hclust(dist) 
plot(hclust) 

並運行上面的代碼後,我會得到一個樹狀圖,而我怎麼可以得出一個陰謀像這樣:

enter image description here

回答

6

如果我按照你想要的是什麼n我們將需要一個函數來計算WSS

wss <- function(d) { 
    sum(scale(d, scale = FALSE)^2) 
} 

和包裝這個wss()功能

wrap <- function(i, hc, x) { 
    cl <- cutree(hc, i) 
    spl <- split(x, cl) 
    wss <- sum(sapply(spl, wss)) 
    wss 
} 

此包裝採用下列參數,輸入:

  • i集羣與數將數據剪切成
  • hc層次聚類分析對象
  • x原始數據

wrap然後切斷樹形圖分割至i簇,原始數據分裂成由cl給出的集羣成員資格,並計算每個羣集的WSS。將這些WSS值相加以給出該羣集的WSS。

我們經營這一切使用sapply在羣集1,2號,...,nrow(data)

res <- sapply(seq.int(1, nrow(data)), wrap, h = cl, x = data) 

一個screeplot可以使用

plot(seq_along(res), res, type = "b", pch = 19) 

下面是一個例子使用繪製着名的埃德加安德森虹膜數據集:

iris2 <- iris[, 1:4] # drop Species column 
cl <- hclust(dist(iris2), method = "ward.D") 

## Takes a little while as we evaluate all implied clustering up to 150 groups 
res <- sapply(seq.int(1, nrow(iris2)), wrap, h = cl, x = iris2) 
plot(seq_along(res), res, type = "b", pch = 19) 

這給出:

enter image description here

我們可以通過只表示第一集羣1:50

plot(seq_along(res[1:50]), res[1:50], type = "o", pch = 19) 

這給

enter image description here

可以加快由兩種主要的計算步驟放大通過適當的並行替代方案運行sapply(),或者只需少量計算即可比例如nrow(data)簇。

res <- sapply(seq.int(1, 50), wrap, h = cl, x = iris2) ## 1st 50 groups 
+0

謝謝!但爲什麼y軸上的數值非常大,而我的數據確實非常小?另外,你能回答我關於wss-plot的另一個問題嗎?:https://stackoverflow.com/questions/25977798/why-is-the-line-of-wss-plot-for-optimize-the- cluster-analysis-looks-so-volaua – 2014-09-22 15:30:00

+0

y軸上的值由數據中變量的比例決定。我會看看另一個Q. – 2014-09-22 16:08:24