如何從SciPy的層次凝聚聚類中獲取質心？

我正在使用SciPy的層次聚集聚類方法來聚類m×n個要素矩陣，但聚類完成後，我似乎無法弄清楚如何從所得到的聚類中獲取質心。下面如下我的代碼：如何從SciPy的層次凝聚聚類中獲取質心？

Y = distance.pdist(features) 
Z = hierarchy.linkage(Y, method = "average", metric = "euclidean") 
T = hierarchy.fcluster(Z, 100, criterion = "maxclust")

我以我的特點矩陣，計算它們之間的歐氏距離，然後將它們傳遞到層次聚類方法。從那裏開始，我創建了平面集羣，最多有100個集羣

現在，基於平面集羣T，如何獲得表示每個平面集羣的1 x n質心？

來源

2012-02-20 Adrian Rosebrock

那麼，到底發生了什麼？你解決了這個問題嗎？怎麼樣？ – 2013-09-24 05:05:28

我實際上最終使用了scikit-learn。 – 2013-09-27 12:42:33

scikit pleasE中的哪個函數？ – 2013-09-28 02:21:39

你可以做這樣的事情（維D =號）：

# Sum the vectors in each cluster 
lens = {}  # will contain the lengths for each cluster 
centroids = {} # will contain the centroids of each cluster 
for idx,clno in enumerate(T): 
    centroids.setdefault(clno,np.zeros(D)) 
    centroids[clno] += features[idx,:] 
    lens.setdefault(clno,0) 
    lens[clno] += 1 
# Divide by number of observations in each cluster to get the centroid 
for clno in centroids: 
    centroids[clno] /= float(lens[clno])

這將給你一個與簇號作爲重點和具體集羣的價值重心的字典。

來源

2012-06-30 12:55:14 dkar

一個可能的解決方案是一個函數，該函數返回scipy.cluster.vq中的質心像kmeans那樣的碼本。你唯一需要的就是分區矢量與平集羣part和原始觀測X

def to_codebook(X, part): 
    """ 
    Calculates centroids according to flat cluster assignment 

    Parameters 
    ---------- 
    X : array, (n, d) 
     The n original observations with d features 

    part : array, (n) 
     Partition vector. p[n]=c is the cluster assigned to observation n 

    Returns 
    ------- 
    codebook : array, (k, d) 
     Returns a k x d codebook with k centroids 
    """ 
    codebook = [] 

    for i in range(part.min(), part.max()+1): 
     codebook.append(X[part == i].mean(0)) 

    return np.vstack(codebook)

來源

2013-11-11 17:46:58 embert

如何從SciPy的層次凝聚聚類中獲取質心？

回答

相關問題