2014-02-24 49 views
1

我剛剛編寫了DBSCAN算法,我想知道DBSCAN算法是否可以允許羣集中的點數小於使用的minPts參數的羣集。DBSCAN算法可以創建一個小於minPts的集羣嗎?

我一直在使用http://people.cs.nctu.edu.tw/~rsliang/dbscan/testdatagen.html來驗證我的實現,它似乎工作正常,只是遇到了這個問題。

我正在對樣本數據集進行一些模擬,並且我一直在使用3的minPts。DBSCAN算法通常會從數據集中創建2點的集羣(從未1)。這是由設計還是我搞砸了實施?

某些樣本數據,EPS = 0.1,minPts = 3

0.307951851891331 0.831249445598223 
0.0223402371734102 0.352948855307395 
0.780763753587736 0.691021379870838 
0.950537940464233 0.849805725668467 
0.66559538881555 0.603627873865714 
0.983049284658883 0.320016804300256 
0.710854941844407 0.646746252033276 
0.404260418566065 0.610378857986247 
0.740377815785062 0.899680181825385 
0.430522446721104 0.597713506593236 
0.0365937198682659 0.109160974206944 
0.378702778545536 0.115744969861463 
0.765229786171219 0.568206346858389 
0.760991609078362 0.59582572271853 
0.970256112036414 0.480310371834929 
0.110018607280226 0.541528500403058 
0.679553015939683 0.951676915377228 
0.730563320094051 0.806108465793593 
0.30542559935964 0.500680956757013 
0.740971321585109 0.670210885196091 
0.877572476806851 0.221948942738561 
0.882196086404005 0.674841667374057 
0.808923079077584 0.740714808339586 
0.935197343553974 0.438659039064617 
0.283511740287539 0.271373094185895 
0.0740317893559261 0.602333299630477 
0.30702819223843 0.0683579570932118 
0.31839294653311 0.198790877684388 
0.452546667052687 0.906595267311947 
0.587719069136176 0.212557406729347 
0.930029770792476 0.354712217745703 
0.879549613632052 0.185285016980621 
0.493609266585488 0.441520784255825 
0.640463788360573 0.759178026467179 
0.916182931939225 0.598151952772472 

輸出集羣:

(Cluster: 1 { 0.780763753587736,0.691021379870838 }, { 0.66559538881555,0.603627873865714 }, { 0.710854941844407,0.646746252033276 }, { 0.765229786171219,0.568206346858389 }, { 0.760991609078362,0.59582572271853 }, { 0.740971321585109,0.670210885196091 }, { 0.882196086404005,0.674841667374057 }, { 0.808923079077584,0.740714808339586 }, { 0.916182931939225,0.598151952772472 }) 
(Cluster: 2 { 0.983049284658883,0.320016804300256 }, { 0.970256112036414,0.480310371834929 }, { 0.935197343553974,0.438659039064617 }, { 0.930029770792476,0.354712217745703 }) 
(Cluster: 3 { 0.404260418566065,0.610378857986247 }, { 0.430522446721104,0.597713506593236 }) 
(Cluster: 4 { 0.740377815785062,0.899680181825385 }, { 0.679553015939683,0.951676915377228 }, { 0.730563320094051,0.806108465793593 }) 
(Cluster: 5 { 0.378702778545536,0.115744969861463 }, { 0.30702819223843,0.0683579570932118 }) 
(Cluster: 6 { 0.110018607280226,0.541528500403058 }, { 0.0740317893559261,0.602333299630477 }) 
(Cluster: 7 { 0.877572476806851,0.221948942738561 }, { 0.879549613632052,0.185285016980621 }) 
(Cluster: 8 { 0.283511740287539,0.271373094185895 }, { 0.31839294653311,0.198790877684388 }) 

回答

4

是。

DBSCAN中的集羣只能保證由組成至少1個核心點

由於屬於多於一個羣集的邊界點將被分配給其中一個羣集的「隨機」(通常是先來先出),核心點可能無法保留/獲取其所有鄰居。因此,它可能小於minPts。

一維例如:minPts = 4,ε= 1:

0.0 0.5 1.0  2.0  3.0 3.5 4.0 
|-------X-------|     Core point at 1.0, Cluster 1. 
       |-------X-------| Core point at 3.0, Cluster 2. 

所有其他點不是核點。由於點2.0是而不是的核心點,因此它不連接兩個羣集。其中一個簇將有4個成員,另一個簇只有3個。

最接近參考實現可能是ELKI中的DBSCAN實現。也許你應該檢查你的結果是否符合ELKI的結果。因爲在你的數據集中,我相信你也有一些編程錯誤。

相關問題