2013-10-24 35 views
8

給定一個樹狀Y,它們具有在高度值z集羣中的k若干意見的數量,我想知道:檢索下高度(z)的集羣(K)

多少觀測來形成簇的數量(k)?

以下是一些可再現的代碼,和圖片來說明問題:

#Necessary packages to reproduce the code 
library(ggplot2) 
library(cluster) 

#Example data 
x = c(6.2, 2.3, 0, 1.54, 2.17, 6.11, 0.3, 1.39, 
    5.14, 12.52, 12.57, 7.13, 13.71, 11.42, 
    8.13, 8.86, 9.97, 10, 8.23, 12.4, 9.51, 
    20.56, 17.78, 14.91, 19.17, 17.48, 17.44, 
    21.32, 
    21.24) 

y = c(7.89, 7.63, 5.29, 8.38, 8.37, 10.5, 21.5, 
    16.65, 23.76, 1.77, 1.8, 10.49, 14.01, 
    10.36, 10.85, 15.02, 14.91, 14.94, 10.76, 
    18.58, 23.12, 0, 13.59, 9.68, 17.32, 17.85, 
    17.79, 4.13, 4.05) 

df = data.frame(cbind(x,y)) 
obs = NROW(df[,1]) #number of data observations 
obs 
[1] 29 

#Clustering 
agnes=agnes(df, metric="euclidean", stand=F, method="average") 
k_number=sum(agnes$height < 1) #number of clusters under dendrogram's height value of 1 
k_number 
[1] 7 # k_number resulted in 7 groups/clusters 

plot(agnes,which.plots=2) 

備註紅色繪製R的外側,並且它們指示7簇下高度1. enter image description here

ggplot(df,aes(x,y)) + xlim(0,22) + ylim(0,25) + 
    geom_point() + 
    geom_text(aes(label=row.names(df)),hjust=0.5, vjust=-1.5, cex=5) 
分組

enter image description here

好吧,有7個集羣來自13 o bservations。

我想找回13

我曾嘗試閱讀大量文件的數量,但因爲我沒有太多熟悉的R和集羣技術,我在努力尋找這一點。韓國社交協會。

回答

6

這應該做的伎倆

# convert to hclust object and obtain cluster assignments for the observations 
R> cl <- cutree(as.hclust(agnes), h=1) 
R> cl 
[1] 1 2 3 2 2 4 5 6 7 8 8 9 10 11 12 13 14 14 12 15 16 17 18 19 20 
[26] 21 21 22 22 
# find non-unique assignments 
R> res <- table(cl) 
R> res[res > 1] 
cl 
2 8 12 14 21 22 
3 2 2 2 2 2 
R> sum(res[res > 1]) 
[1] 13 

更新:截止H = 2

R> cl <- cutree(as.hclust(agnes), h=2) 
R> cl 
[1] 1 2 3 2 2 4 5 6 7 8 8 4 9 10 4 11 11 11 4 12 13 14 15 16 17 
[26] 17 17 18 18 
R> res <- table(cl) 
R> res[res > 1] 
cl 
2 4 8 11 17 18 
3 4 2 3 3 2 
R> sum(res[res > 1]) 
[1] 17 
+1

我用'統計:: cutree'。軟件包'dynamicTreeCut'未安裝在我的系統上... – rcs