測試聚類算法的最佳方法

from numpy import * 
import pylab as plt 

# Make a block diagonal matrix 
N = 30 
c = 5 
A = zeros((N*c,N*c)) 
for m in xrange(c): 
    A[m*N:(m+1)*N, m*N:(m+1)*N] = random.random((N,N)) 

# Add some noise 
A += random.random(A.shape) * 0.1 

# Make symmetric 
A += A.T - diag(A.diagonal()) 

# Show the original matrix 
plt.subplot(131) 
plt.imshow(A.copy(), interpolation='nearest') 

# Permute the matrix for effect 
idx = random.permutation(N*c) 
A = A[idx,:][:,idx] 

# Compute eigenvalues 
L = linalg.eigvalsh(A) 

# Show the results 
plt.subplot(132) 
plt.imshow(A, interpolation='nearest') 
plt.subplot(133) 
plt.plot(sorted(L,reverse=True)) 

plt.plot([c-.5,c-.5],[0,max(L)],'r--') 

plt.ylim(0,max(L)) 
plt.xlim(0,20) 
plt.show()

來源

2012-04-23 13:56:10 Hooked

它有時是有益的構造的輸入數據，其中有一個已知的，並且也許明顯，回答建設。對於聚類算法，您可以使用N個聚類構建數據，使得同一個聚類中任意兩個點之間的最大距離小於不同聚類中任意兩個點之間的最小距離。另一種選擇是生成許多不同的數據集，作爲具有顯而易見的羣集的二維散點圖，然後將算法的結果與此結構進行比較，可能將羣集移動到一起以查看算法何時無法看到他們。

根據您的特定聚類算法的知識，您可能會做得更好，但以上可能至少有一些機會可以從掩護中清除明顯的錯誤。

來源

2012-04-23 17:27:30 mcdowella

測試聚類算法的最佳方法

回答

相關問題