2014-05-22 54 views
3

在scipy中,不支持使用數據 擬合負二項式分佈(可能是由於scipy中的負二項式只是離散的事實)。在python中擬合負二項式

對於正態分佈我只想做:

from scipy.stats import norm 
param = norm.fit(samp) 

是否有類似的東西「準備使用」在任何其他庫函數?

回答

4

不僅因爲它是離散的,還因爲最大似然擬負二項可能會相當牽扯,尤其是帶有一個附加位置參數。這將是爲什麼(在Scipy和其他離散分佈)不爲它提供.fit()方法的原因,這裏有一個例子:

In [163]: 

import scipy.stats as ss 
import scipy.optimize as so 
In [164]: 
#define a likelihood function 
def likelihood_f(P, x, neg=1): 
    n=np.round(P[0]) #by definition, it should be an integer 
    p=P[1] 
    loc=np.round(P[2]) 
    return neg*(np.log(ss.nbinom.pmf(x, n, p, loc))).sum() 
In [165]: 
#generate a random variable 
X=ss.nbinom.rvs(n=100, p=0.4, loc=0, size=1000) 
In [166]: 
#The likelihood 
likelihood_f([100,0.4,0], X) 
Out[166]: 
-4400.3696690513316 
In [167]: 
#A simple fit, the fit is not good and the parameter estimate is way off 
result=so.fmin(likelihood_f, [50, 1, 1], args=(X,-1), full_output=True, disp=False) 
P1=result[0] 
(result[1], result[0]) 
Out[167]: 
(4418.599495886474, array([ 59.61196161, 0.28650831, 1.15141838])) 
In [168]: 
#Try a different set of start paramters, the fit is still not good and the parameter estimate is still way off 
result=so.fmin(likelihood_f, [50, 0.5, 0], args=(X,-1), full_output=True, disp=False) 
P1=result[0] 
(result[1], result[0]) 
Out[168]: 
(4417.1495981801972, 
array([ 6.24809397e+01, 2.91877405e-01, 6.63343536e-04])) 
In [169]: 
#In this case we need a loop to get it right 
result=[] 
for i in range(40, 120): #in fact (80, 120) should probably be enough 
    _=so.fmin(likelihood_f, [i, 0.5, 0], args=(X,-1), full_output=True, disp=False) 
    result.append((_[1], _[0])) 
In [170]: 
#get the MLE 
P2=sorted(result, key=lambda x: x[0])[0][1] 
sorted(result, key=lambda x: x[0])[0] 
Out[170]: 
(4399.780263084549, 
array([ 9.37289361e+01, 3.84587087e-01, 3.36856705e-04])) 
In [171]: 
#Which one is visually better? 
plt.hist(X, bins=20, normed=True) 
plt.plot(range(260), ss.nbinom.pmf(range(260), np.round(P1[0]), P1[1], np.round(P1[2])), 'g-') 
plt.plot(range(260), ss.nbinom.pmf(range(260), np.round(P2[0]), P2[1], np.round(P2[2])), 'r-') 
Out[171]: 
[<matplotlib.lines.Line2D at 0x109776c10>] 

enter image description here

+0

我意識到這是一個非常臨時的解決方案。優化功能不能始終工作 – Gioelelm