2015-10-22 69 views
1

我想估計下面的模型貝葉斯估計壽命/ theano

enter image description here

在這裏我提供統一的優先enter image description here和我的代碼的可能性enter image description here。後者來源於此paper和去如下:

enter image description here

在theano/pymc3實現我的計算等式右邊的第一和第二項中first_termsecond_term。最後logp總和整個樣本。

Theano,對自己是生產一些輸出然而,當我在pymc3模型它集成出現以下錯誤:

TypeError: ('Bad input argument to theano function with name "<ipython-input-90-a5304bf41c50>:27" at index 0(0-based)', 'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')

我認爲這個問題是如何提供pymc3變量theano。

from pymc3 import Model, Uniform, DensityDist 
import theano.tensor as T 
import theano 
import numpy as np 

p_test, theta_test = .1, .1 
X = np.asarray([[1,2,3],[1,2,3]]) 

### theano test 
theano.config.compute_test_value = 'off' 
obss = T.matrix('obss') 
p, theta = T.scalar('p'), T.scalar('theta') 
def first_term(obs, p, theta): 
    x, tx, n = obs[0], obs[1], obs[2] 
    first_comp = p ** x * (1 - p) ** (n - x) * (1 - theta) ** n 
    return(first_comp) 

def second_term(obs, p, theta): 
    x, tx, n = obs[0], obs[1], obs[2] 

    components, updates = theano.scan(
        lambda t, p, theta, x, tx: p ** x * (1 - theta) ** (tx-x+t) * theta * (1 - theta) ** (tx + t), 
        sequences=theano.tensor.arange(n), non_sequences = [p, theta, x, tx] 
    ) 
    return(components) 

def logp(X, p_hat, theta_hat): 
    contributions, updates = theano.scan(lambda obs, p, theta: first_term(obs, p, theta) + T.sum(second_term(obs, p, theta)) , 
          sequences = obss, non_sequences = [p, theta] 
    ) 
    ll = contributions.sum() 
    get_ll = theano.function(inputs = [obss, p, theta], outputs = ll) 

    return(get_ll(X, p_hat , theta_hat)) 

print(logp(X, p_test, theta_test)) # It works! 

### pymc3 implementation 
with Model() as bg_model: 

    p = Uniform('p', lower = 0, upper = 1) 
    theta = Uniform('theta', lower = 0, upper = .2) 


    def first_term(obs, p, theta): 
     x, tx, n = obs[0], obs[1], obs[2] 
     first_comp = p ** x * (1 - p) ** (n - x) * (1 - theta) ** n 
     return(first_comp) 

    def second_term(obs, p, theta): 
     x, tx, n = obs[0], obs[1], obs[2] 

     components, updates = theano.scan(
         lambda t, p, theta, x, tx: p ** x * (1 - theta) ** (tx-x+t) * theta * (1 - theta) ** (tx + t), 
         sequences=theano.tensor.arange(n), non_sequences = [p, theta, x, tx] 
     ) 
     return(components) 

    def logp(X): 
     contributions, updates = theano.scan(lambda obs, p, theta: first_term(obs, p, theta) + T.sum(second_term(obs, p, theta)) , 
           sequences = obss, non_sequences = [p, theta] 
     ) 
     ll = contributions.sum() 
     get_ll = theano.function(inputs = [obss, p, theta], outputs = ll) 
     return(get_ll(X, p, theta)) 

    y = pymc3.DensityDist('y', logp, observed = X) # Nx4 y = f(f,x,tx,n | p, theta) 

我的第一個猜測是修改logpreturn(get_ll(X, p.eval(), theta.eval()))但隨後theano抱怨一些神祕的p_interval從圖中失蹤。任何線索?

回答

3

我計算出來的條件:ⅰ)簡化東西ⅱ)避免編碼的可能性在使用theano運營商和iii)使用所述內置的包裝確定性爲變量確定性變換(我的壽命)。爲了加快計算我通過寫在RHS作爲幾何級數的溶液中的第二項矢量化的可能性。如果有人想在他自己的終生應用中測試它,那麼這裏是代碼。

from pymc3 import Model, Uniform, Deterministic 
import pymc3 
from scipy import optimize 
import theano.tensor as T 

X = array([[ 5, 64, 8, 13],[ 4, 71, 23, 41],[ 7, 70, 4, 19]) 
#f, n, x, tx where f stands for the frequency of the triple (n, x, tx) 

class CustomDist(pymc3.distributions.Discrete): 
    def __init__(self, p, theta, *args, **kwargs): 
     super(CustomDist, self).__init__(*args, **kwargs) 
     self.p = p 
     self.theta = theta 

    def logp(self, X): 
     p = self.theta 
     theta = self.theta 
     f, n, x, tx = X[0],(X[1] + 1),X[2],(X[3] + 1) #time indexed at 0, x > n 
     ll = f * T.log( p ** x * (1 - p) ** (n - x) * (1 - theta) ** n + 
       [(1 - p) ** (tx - x) * (1 - theta) ** tx - (1 - p) ** (n - x) * (1 - theta) ** n]/(1 - (1 - p) * (1 - theta))) 
     # eliminated -1 since would result in negatice ll 
     return(T.sum(ll)) 

with Model() as bg_model: 
    p = Uniform('p', lower = 0, upper = 1) 
    theta = Uniform('theta', lower = 0, upper = 1) 
    like = CustomDist('like', p = p, theta = theta, observed=X.T) #likelihood 

    lt = pymc3.Deterministic('lt', p/theta) 
    # start = {'p':.5, 'theta':.1} 
    start = pymc3.find_MAP(fmin=optimize.fmin_powell) 
    step = pymc3.Slice([p, theta, lt]) 
    trace = pymc3.sample(5000, start = start, step = [step]) 

pymc3.traceplot(trace[2000:]) 
+0

謝謝你的職位。我正在努力解決類似的問題,這對我有很大的幫助。順便說一句,我認爲你需要'p = self.p'在第一行logp方法。 – JavNoor

+0

另外一個封閉的支架是X中的定義缺失,否則你的代碼運行完全。 – JavNoor