2014-03-28 147 views
2

我正在遵循我在http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#subclassing-rv-discrete找到的一個代碼示例,用於爲正態分佈的離散值實現一個隨機數生成器。確切的例子(不奇怪)工作得很好,但如果我修改它只允許左或右尾的結果,那麼圍繞0的分佈應該太低(零點零點應該包含更多的值)。我一定遇到了邊界條件,但無法解決這個問題。我錯過了什麼嗎?numpy生成離散概率分佈

這是每個倉計數的隨機數的結果:

np.bincount(rvs) [1082 2069 1833 1533 1199 837 644 376 218 111 55 20 12 7 2 2] 

這是直方圖:

enter image description here

from scipy import stats 

np.random.seed(42) 

def draw_discrete_gaussian(rng, tail='both'): 
    # number of integer support points of the distribution minus 1 
    npoints = rng if tail == 'both' else rng * 2 
    npointsh = npoints/2 
    npointsf = float(npoints) 
    # bounds for the truncated normal 
    nbound = 4 
    # actual bounds of truncated normal 
    normbound = (1+1/npointsf) * nbound 
    # integer grid 
    grid = np.arange(-npointsh, npointsh+2, 1) 
    # bin limits for the truncnorm 
    gridlimitsnorm = (grid-0.5)/npointsh * nbound 
    # used later in the analysis 
    gridlimits = grid - 0.5 
    grid = grid[:-1] 
    probs = np.diff(stats.truncnorm.cdf(gridlimitsnorm, -normbound, normbound)) 
    gridint = grid 

    normdiscrete = stats.rv_discrete(values=(gridint, np.round(probs, decimals=7)), name='normdiscrete') 
    # print 'mean = %6.4f, variance = %6.4f, skew = %6.4f, kurtosis = %6.4f'% normdiscrete.stats(moments = 'mvsk') 
    rnd_val = normdiscrete.rvs() 
    if tail == 'both': 
     return rnd_val 
    if tail == 'left': 
     return -abs(rnd_val) 
    elif tail == 'right': 
     return abs(rnd_val) 


rng = 15 
tail = 'right' 
rvs = [draw_discrete_gaussian(rng, tail=tail) for i in xrange(10000)] 

if tail == 'both': 
    rng_min = rng/-2.0 
    rng_max = rng/2.0 
elif tail == 'left': 
    rng_min = -rng 
    rng_max = 0 
elif tail == 'right': 
    rng_min = 0 
    rng_max = rng 

gridlimits = np.arange(rng_min-.5, rng_max+1.5, 1) 
print gridlimits 
f, l = np.histogram(rvs, bins=gridlimits) 

# cheap way of creating histogram 
import matplotlib.pyplot as plt 
%matplotlib inline 

bins, edges = f, l 
left,right = edges[:-1],edges[1:] 
X = np.array([left, right]).T.flatten() 
Y = np.array([bins, bins]).T.flatten() 

# print 'rvs', rvs 
print 'np.bincount(rvs)', np.bincount(rvs) 

plt.plot(X,Y) 
plt.show() 
+2

綜觀圖,在我看來,像濱0包含從-0.5到0.5之間的一切。如果是這樣,那麼它就是下一個垃圾箱的一半就不足爲奇了。你不會從該垃圾箱的左半邊產生結果。 – user2357112

+0

@ user2357112:我可能是錯的,但我認爲這只是由於可視化(它圍繞着bin號碼,而實際上bin被限制在+0.5)。如果我做'gridlimits = np.arange(rng_min,rng_max + 2,1)',它是一樣的圖。 – orange

+1

我也認爲@ user235711是正確的。當你服用腹肌時,你需要結合Probs的負面和正面的箱子。檢查從零開始的垃圾箱的長度與其他垃圾箱的長度相同。我只需要在右邊或左邊截取正確的截斷法線,即開始或結束於零。 – user333700

回答

0

我嘗試根據意見,回答我的問題from @ user333700 and @ user235711:

我插入到方法之前normdiscrete = ...

if tail == 'right': 
    gridint = gridint[npointsh:] 
    probs = probs[npointsh:] 
    s = probs.sum() 
    probs = probs/s 
elif tail == 'left': 
    gridint = gridint[0: npointsh] 
    probs = probs[0: npointsh] 
    s = probs.sum() 
    probs = probs/s 

產生的直方圖enter image description hereenter image description here看起來更美觀: