scipy linkage格式

我已經編寫了自己的聚類例程並希望生成樹狀圖。最簡單的方法是使用scipy dendrogram功能。但是，這要求輸入的格式與scipy linkage函數產生的格式相同。我無法找到如何格式化輸出的例子。我想知道是否有人能夠啓發我。scipy linkage格式

來源

2012-03-23 geo_pythoncl

檢查：http://users.soe.ucsc.edu/~eads/iris.html 也許可以幫你！ – 2012-03-23 12:15:51

這是從scipy.cluster.hierarchy.linkage()函數文檔，我認爲這是爲輸出格式的非常清楚的說明：由4矩陣Z

A（Ñ -1）被返回。在迭代中，具有索引Z [i，0]和Z [i，1]的簇被組合以形成簇n + i。索引小於的集羣n對應於原始觀察結果之一。 Z [i，0]和Z [i，1]之間的距離由Z [i，2]給出。第四個值Z [i，3]表示新形成的聚類中原始觀測的數量。

你需要更多東西嗎？

來源

2012-06-08 21:33:20 dkar

小修正：它是（n-1）乘4矩陣。 – HerrKaputt 2015-02-26 15:15:42

是的，更多的信息會非常有幫助。例如，如果我要枚舉所有的索引，那麼它是什麼樣的遍歷？節點的標籤究竟如何？如果一個清晰而精確的例子能夠讓你滿意，並且能夠逐步確定如何格式化，以及樹和每個節點對應的所有標籤。 – mortonjt 2016-10-03 19:26:58

我同意https://stackoverflow.com/users/1167475/mortonjt該文檔沒有完全解釋中間集羣的索引，但我同意https://stackoverflow.com/users/1354844/dkar該格式另有說明。

從這個問題使用示例數據：Tutorial for scipy.cluster.hierarchy

A = np.array([[0.1, 2.5], 
       [1.5, .4 ], 
       [0.3, 1 ], 
       [1 , .8 ], 
       [0.5, 0 ], 
       [0 , 0.5], 
       [0.5, 0.5], 
       [2.7, 2 ], 
       [2.2, 3.1], 
       [3 , 2 ], 
       [3.2, 1.3]])

甲聯動矩陣可以使用單個構建（即，最接近的匹配點）：

z = hac.linkage(a, method="single") 

array([[ 7.  , 9.  , 0.3  , 2.  ], 
     [ 4.  , 6.  , 0.5  , 2.  ], 
     [ 5.  , 12.  , 0.5  , 3.  ], 
     [ 2.  , 13.  , 0.53851648, 4.  ], 
     [ 3.  , 14.  , 0.58309519, 5.  ], 
     [ 1.  , 15.  , 0.64031242, 6.  ], 
     [ 10.  , 11.  , 0.72801099, 3.  ], 
     [ 8.  , 17.  , 1.2083046 , 4.  ], 
     [ 0.  , 16.  , 1.5132746 , 7.  ], 
     [ 18.  , 19.  , 1.92353841, 11.  ]])

作爲文檔解釋的n以下的簇（這裏：11）僅僅是原始矩陣A中的數據點。向前的中間簇依次被索引。

因此，簇7和9（第一合併）合併成11組，簇4和6到12中。然後觀察線3，合併的簇（來自A）5和12（來自未示出的中間簇12），結果具有0.5的羣內距離（WCD）。單一的方法需要新的WCS爲0.5，這是A [5]和簇12中最近點A [4]和A [6]之間的距離。讓我們來看看：

In [198]: norm([a[5]-a[4]]) 
Out[198]: 0.70710678118654757 
In [199]: norm([a[5]-a[6]]) 
Out[199]: 0.5

這個集羣現在應該是中間集羣13，隨後它與A [2]合併。因此，新距離應該是A [2]和A [4,5,6]之間最接近的距離。

In [200]: norm([a[2]-a[4]]) 
Out[200]: 1.019803902718557 
In [201]: norm([a[2]-a[5]]) 
Out[201]: 0.58309518948452999 
In [202]: norm([a[2]-a[6]]) 
Out[202]: 0.53851648071345048

其中，可以看出也可以檢查出來，並解釋新簇的中間格式。

來源

2016-12-05 21:23:58 user1603472

scipy文檔是準確的，正如dkar指出的那樣......但將返回的數據轉換爲可用於進一步分析的東西有點困難。

在我看來，他們應該包括能夠像數據結構一樣在樹中返回數據。下面的代碼將遍歷矩陣，並建立一個樹：

from scipy.cluster.hierarchy import linkage 
import numpy as np 

a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[100,]) 
b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[50,]) 
centers = np.concatenate((a, b),) 

def create_tree(centers): 
    clusters = {} 
    to_merge = linkage(centers, method='single') 
    for i, merge in enumerate(to_merge): 
     if merge[0] <= len(to_merge): 
      # if it is an original point read it from the centers array 
      a = centers[int(merge[0]) - 1] 
     else: 
      # other wise read the cluster that has been created 
      a = clusters[int(merge[0])] 

     if merge[1] <= len(to_merge): 
      b = centers[int(merge[1]) - 1] 
     else: 
      b = clusters[int(merge[1])] 
     # the clusters are 1-indexed by scipy 
     clusters[1 + i + len(to_merge)] = { 
      'children' : [a, b] 
     } 
     #^you could optionally store other info here (e.g distances) 
    return clusters 

print create_tree(centers)

來源

2017-01-20 06:25:03

scipy linkage格式

回答

相關問題