問題sklearn.mixture.GMM（高斯混合模型）

我是scikit-lear和GMM的新手......一般來說我有一個問題，在python（scikit-learn）中使用高斯混合模型的適合質量，。問題sklearn.mixture.GMM（高斯混合模型）

我有一個數據陣列，你可以在DATA HERE找到我想要適合於具有n = 2分量的GMM的數據。

作爲基準我疊加了一個正常擬合。

錯誤/古怪：

設定n = 1個的組分，我無法與GMM（1）的正常基準配合
設定n = 2個的組分，所述正常合身比GMM更好恢復（2 ）適合
GMM（N）似乎總是提供相同的配合...

這裏是我得到：我在做什麼錯在這裏？（圖片顯示與GMM（2）的擬合）。在此先感謝您的幫助。下面

碼（運行它，保存在同一文件夾的數據）

from numpy import * 
import pandas as pd 
import matplotlib.pyplot as plt 
from datetime import datetime 
from collections import OrderedDict 
from scipy.stats import norm 
from sklearn.mixture import GMM 

# Upload the data: "epsi" (array of floats) 
file_xlsx = './db_X.xlsx' 
data = pd.read_excel(file_xlsx) 
epsi = data["epsi"].values; 
t_ = len(epsi); 

# Normal fit (for benchmark) 
epsi_grid = arange(min(epsi),max(epsi)+0.001,0.001); 

mu  = mean(epsi); 
sigma2 = var(epsi); 

normal = norm.pdf(epsi_grid, mu, sqrt(sigma2)); 

# TENTATIVE - Gaussian mixture fit 
gmm = GMM(n_components = 2); # fit quality doesn't improve if I set: covariance_type = 'full' 
gmm.fit(reshape(epsi,(t_,1))); 

gauss_mixt = exp(gmm.score(reshape(epsi_grid,(len(epsi_grid),1)))); 

# same result if I apply the definition of pdf of a Gaussian mixture: 
# pdf_mixture = w_1 * N(mu_1, sigma_1) + w_2 * N(mu_2, sigma_2) 
# as suggested in: 
# http://stackoverflow.com/questions/24878729/how-to-construct-and-plot-uni-variate-gaussian-mixture-using-its-parameters-in-p 
# 
#gauss_mixt = array([p * norm.pdf(epsi_grid, mu, sd) for mu, sd, p in zip(gmm.means_.flatten(), sqrt(gmm.covars_.flatten()), gmm.weights_)]); 
#gauss_mixt = sum(gauss_mixt, axis = 0); 


# Create a figure showing the comparison between the estimated distributions 

# setting the figure object 
fig = plt.figure(figsize = (10,8)) 
fig.set_facecolor('white') 
ax = plt.subplot(111) 

# colors 
red = [0.9, 0.3, 0.0]; 
grey = [0.9, 0.9, 0.9]; 
green = [0.2, 0.6, 0.3]; 

# x-axis limits 
q_inf = float(pd.DataFrame(epsi).quantile(0.0025)); 
q_sup = float(pd.DataFrame(epsi).quantile(0.9975)); 
ax.set_xlim([q_inf, q_sup]) 

# empirical pdf of data 
nb  = int(10*log(t_)); 
ax.hist(epsi, bins = nb, normed = True, color = grey, edgecolor = 'k', label = "Empirical"); 

# Normal fit 
ax.plot(epsi_grid, normal, color = green, lw = 1.0, label = "Normal fit"); 

# Gaussian Mixture fit 
ax.plot(epsi_grid, gauss_mixt, color = red, lw = 1.0, label = "GMM(2)"); 

# title 
ax.set_title("Issue: Normal fit out-performs the GMM fit?", size = 14) 

# legend 
ax.legend(loc='upper left'); 

plt.tight_layout() 
plt.show()

來源

2016-04-14 Gabriele Pompa

任何人可以重現該問題？謝謝 –

的問題是在結合在單一組分方差min_covar，這是默認1e-3並且意防止過度配合。

降低該限制解決了這個問題（見圖片）：

gmm = GMM(n_components = 2, min_covar = 1e-12)

來源

2016-04-18 11:03:07

問題sklearn.mixture.GMM（高斯混合模型）

回答

相關問題