2011-12-02 16 views
1

如果我們將K-means和連續K-means方法應用於具有相同初始設置的相同數據集,我們是否會得到相同的結果?解釋你的理由。K-means和連續K-means的結果相同嗎?

我個人認爲答案是否定的。順序K-means得到的結果取決於數據點的顯示順序。結局條件並不相同。

這裏附上兩個聚類算法的僞代碼。

K均值

Make initial guesses for the means m1, m2, ..., mk 
Until there is no change in any mean 
    Assign each data point to the cluster whose mean is the nearest. 
    Calculate the mean of each cluster. 
    For i from 1 to k 
     Replace mi with the mean of all examples for cluster i. 
    end_for 
end_until 

順序的K-means

Make initial guesses for the means m1, m2, ..., mk 
Set the counts n1, n2, ..., nk to zero 
Until interrupted 
    Acquire the next example, x 
    If mi is closest to x 
     Increment ni 
     Replace mi by mi + (1/ni)*(x - mi) 
    end_if 
end_until 

回答

5

正確,其結果可能是不同的。 (0,1),x3 =(0.75,0),x4 =(0.25,1); x1 = m1 =(0,0.5),m2 =(1,0.5)。 K-means將x1和x4分配給m1簇,將x2和x3分配給m2簇。新方法是m1'=(0.125,0.5)和m2'=(0.875,0.5),並且不發生重新分配。使用順序K-means,在分配x1後,m1移動到(0,0),x2移動m2到(1,1)。那麼m1最接近x3,所以m1移動到(0.375,0)。最後,m2最接近x4,所以m2移動到(0.625,1)。這又是一個穩定的配置。

+0

由反例所證明的案例如此關閉+1 –

+0

理解,謝謝。 –