2012-01-02 84 views
4

我試圖將一些數據擬合到對數正態分佈,並由此使用優化參數生成隨機對數正態分佈。 一些搜索後,我找到了一些解決方案,但沒有說服力:使用觀測數據的形狀生成隨機對數正態分佈

import numpy as np 
from scipy.stats  import lognorm 

mydata = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6,7,7,7,8,8,8,8,8,9,9,9,10,10,11,12,13,14,14,15,19,19,21,23,25,27,28,30,31,36,41,45,48,52,55,60,68,75,86,118,159,207,354] 

shape, loc, scale = lognorm.fit(mydata) 
rnd_log = lognorm.rvs (shape, loc=loc, scale=scale, size=100) 

或解決方案2使用mu和sigma從原始數據:

import numpy as np 
from scipy.stats  import lognorm 

mydata = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6,7,7,7,8,8,8,8,8,9,9,9,10,10,11,12,13,14,14,15,19,19,21,23,25,27,28,30,31,36,41,45,48,52,55,60,68,75,86,118,159,207,354] 

mu = np.mean([np.log(i) for i in mydata]) 
sigma = np.std([np.log(i) for i in mydata]) 

distr = lognorm(mu, sigma) 
rnd_log = distr.rvs (size=100) 
使用擬合函數

解決方法1

這些解決方案都不太適合:

import pylab 
pylab.plot(sorted(mydata, reverse=True), 'ro') 
pylab.plot(sorted(rnd_log, reverse=True), 'bx') 

我不知道我的理解以及使用分配的方式,或者如果我失去了一些東西別的......

我雖然在這裏找到解決方案:Does anyone have example code of using scipy.stats.distributions? 但我無法得到的形狀從我的數據...我錯過了使用適合函數的東西?

感謝

編輯:

這是一個例子,以便更好地瞭解我的問題:

print 'solution 1:' 
means = [] 
stdes = [] 
distr = lognorm(mu, sigma) 
for _ in xrange(1000): 
    rnd_log = distr.rvs (size=100) 
    means.append (np.mean([np.log(i) for i in rnd_log])) 
    stdes.append (np.std ([np.log(i) for i in rnd_log])) 
print 'observed mean:',mu , 'mean simulated mean:', np.mean (means) 
print 'observed std :',sigma, 'mean simulated std :', np.mean (stdes) 

print '\nsolution 2:' 
means = [] 
stdes = [] 
shape, loc, scale = lognorm.fit(mydata) 
for _ in xrange(1000): 
    rnd_log = lognorm.rvs (shape, loc=loc, scale=scale, size=100) 
    means.append (np.mean([np.log(i) for i in rnd_log])) 
    stdes.append (np.std ([np.log(i) for i in rnd_log])) 
print 'observed mean:',mu , 'mean simulated mean:', np.mean (means) 
print 'observed std :',sigma, 'mean simulated std :', np.mean (stdes) 

結果是:

solution 1: 
observed mean: 1.82562655734 mean simulated mean: 1.18929982267 
observed std : 1.39003773799 mean simulated std : 0.88985924363 

solution 2: 
observed mean: 1.82562655734 mean simulated mean: 4.50608084668 
observed std : 1.39003773799 mean simulated std : 5.44206119499 

同時,如果我做同樣的R:

mydata <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6,7,7,7,8,8,8,8,8,9,9,9,10,10,11,12,13,14,14,15,19,19,21,23,25,27,28,30,31,36,41,45,48,52,55,60,68,75,86,118,159,207,354) 
meanlog <- mean(log(mydata)) 
sdlog <- sd(log(mydata)) 
means <- c() 
stdes <- c() 
for (i in 1:1000){ 
    rnd.log <- rlnorm(length(mydata), meanlog, sdlog) 
    means <- c(means, mean(log(rnd.log))) 
    stdes <- c(stdes, sd(log(rnd.log))) 
} 

print (paste('observed mean:',meanlog,'mean simulated mean:',mean(means),sep=' ')) 
print (paste('observed std :',sdlog ,'mean simulated std :',mean(stdes),sep=' ')) 

我得到:

[1] "observed mean: 1.82562655733507 mean simulated mean: 1.82307191072317" 
[1] "observed std : 1.39704049131865 mean simulated std : 1.39736545866904" 

是更近了,所以我想我使用SciPy的時候做錯了什麼。 ..

+1

什麼這是mydata數組?爲了擬合,我期望看到x值和y值......應該如何解釋這個數組? – Tanriol 2012-01-02 19:14:51

+0

你有沒有看過[關於對數正態分佈參數估計的許多論文](http://scholar.google.com/scholar?q=lognormal+parameter+estimation&hl=zh-CN&as_sdt=0&as_vis=1&oi=scholart)? – 2012-01-02 19:32:43

+0

好的,我很抱歉,我想我的問題還不夠清楚。我編輯它。 – fransua 2012-01-03 10:25:06

回答

4

scipy中的對數正態分佈參數與通常的方式有一些不同。請參閱scipy.stats.lognorm文檔,特別是「註釋」部分。

這裏是如何得到你希望(我們持有的位置爲0時,接頭注)的結果:

In [315]: from scipy import stats 

In [316]: x = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6,7,7,7,8,8,8,8,8,9,9,9,10,10,11,12,13,14,14,15,19,19,21,23,25,27,28,30,31,36,41,45,48,52,55,60,68,75,86,118,159,207,354]) 

In [317]: mu, sigma = stats.norm.fit(np.log(x)) 

In [318]: mu, sigma 
Out[318]: (1.8256265573350701, 1.3900377379913127) 

In [319]: shape, loc, scale = stats.lognorm.fit(x, floc=0) 

In [320]: np.log(scale), shape 
Out[320]: (1.8256267737298788, 1.3900309739954713) 

現在,您可以生成樣本,確認您的期望:

In [321]: dist = stats.lognorm(shape, loc, scale) 

In [322]: means, sds = [], [] 

In [323]: for i in xrange(1000): 
    .....:  sample = dist.rvs(size=100) 
    .....:  logsample = np.log(sample) 
    .....:  means.append(logsample.mean()) 
    .....:  sds.append(logsample.std()) 
    .....: 

In [324]: np.mean(means), np.mean(sds) 
Out[324]: (1.8231068508345041, 1.3816361818739145) 
+0

太棒了!非常感謝!是的,我在文檔中看到了tho筆記,但對我而言仍然不清楚。我不知道你是否可以影響......但在文檔中這樣的例子會對像我這樣的新手有所幫助:)。最後我發現了另一個使用random.lognormvariate(mu,sigma)的解決方案,但是這絕對是更好的! – fransua 2012-01-03 20:43:19

相關問題