高斯混合使用scikit學習混合

我想使用sklearn.mixture.GMM來適應一些數據的混合高斯，結果類似於我使用R的「Mclust」包得到的結果。高斯混合使用scikit學習混合

的數據是這樣的： enter image description here

因此，這裏就是我的集羣中使用R中的數據，它給了我14個很好地分離集羣，並輕而易舉地上下樓梯：

data <- read.table('~/gmtest/foo.csv',sep=",") 
library(mclust) 
D = Mclust(data,G=1:20) 
summary(D) 
plot(D, what="classification")

而這裏的當我用python試用它時，我會說：

from sklearn import mixture 
import numpy as np 
import os 
import pyplot 

os.chdir(os.path.expanduser("~/gmtest")) 
data = np.loadtxt(open('foo.csv',"rb"),delimiter=",",skiprows=0) 
gmm = mixture.GMM(n_components=14,n_iter=5000, covariance_type='full') 
gmm.fit(data) 

classes = gmm.predict(data) 
pyplot.scatter(data[:,0], data[:,1], c=classes) 
pyplot.show()

它將所有點都分配給同一個羣集。我也注意到，當我告訴它找到激動人心的1簇時，擬合的AIC最低，並隨着簇數量的增加而線性增加。我究竟做錯了什麼？我需要考慮其他參數嗎？

Mclust和sklearn.mixture使用的模型有差異嗎？

但更重要的是：什麼是最好的方式sklearn來聚集我的數據？

來源

2015-02-10 David DeWert

Mclust默認使用完全協方差嗎？ – 2015-02-11 00:12:34

訣竅是設置GMM的min_covar。因此，在這種情況下，我得到了良好的效果：

mixture.GMM(n_components=14,n_iter=5000, covariance_type='full',min_covar=0.0000001)

大的默認值min_covar所有點分配給一個集羣。

來源

2015-02-10 18:19:35

您的數據如何縮放？我不確定這個默認值是不是比例不變的，也許我們應該改變它...... – 2015-02-11 00:19:49

我沒有想到縮放數據。如果我說：'data = scale（data）'，然後'gmm.fit（data）'，它可以很好地與默認的_min_covar_配合使用。 – 2015-02-11 17:32:30

高斯混合使用scikit學習混合

回答

相關問題