2015-03-31 51 views
2

爲了簡化問題,說一個維度(或特徵)已經更新n次後,下次看到該特徵時,我想將學習率設置爲1/n 。如何在python中編碼adagrad theano

我想出了這些代碼:

def test_adagrad(): 
    embedding = theano.shared(value=np.random.randn(20,10), borrow=True) 
    times = theano.shared(value=np.ones((20,1))) 
    lr = T.dscalar() 
    index_a = T.lvector() 
    hist = times[index_a] 
    cost = T.sum(theano.sparse_grad(embedding[index_a])) 
    gradients = T.grad(cost, embedding) 
    updates = [(embedding, embedding+lr*(1.0/hist)*gradients)] 
    ### Here should be some codes to update also times which are omitted ### 
    train = theano.function(inputs=[index_a, lr],outputs=cost,updates=updates) 
    for i in range(10): 
    print train([1,2,3],0.05) 

Theano不給任何錯誤,但培訓效果給予楠有時。有誰知道如何解決這個問題嗎?

謝謝您的幫助

PS:我懷疑它是在產生問題稀疏空間的操作。所以我試圖用theano.sparse.mul替換*。這給了我前面提到的一些結果

回答

8

也許你可以利用以下example for implementation of adadelta,並用它來派生自己的。請更新,如果你成功:-)

+0

非常感謝您的回答 – 2015-04-16 08:15:44

+1

不客氣:-)如果你發現它是有用的,請註明的答案爲「接受」,並給予好評吧: - ) 此外 - 如果你想跟進未來的用戶 - 你也可以附上你的實施... – zuuz 2015-04-17 11:23:10

1

我正在尋找同樣的事情,並最終實現它自己在資源zuuz已指出的風格。所以,這可能有助於任何人在這裏尋找幫助。

def adagrad(lr, tparams, grads, inp, cost): 
    # stores the current grads 
    gshared = [theano.shared(np.zeros_like(p.get_value(), 
              dtype=theano.config.floatX), 
          name='%s_grad' % k) 
       for k, p in tparams.iteritems()] 
    grads_updates = zip(gshared, grads) 
    # stores the sum of all grads squared 
    hist_gshared = [theano.shared(np.zeros_like(p.get_value(), 
               dtype=theano.config.floatX), 
            name='%s_grad' % k) 
        for k, p in tparams.iteritems()] 
    rgrads_updates = [(rg, rg + T.sqr(g)) for rg, g in zip(hist_gshared, grads)] 

    # calculate cost and store grads 
    f_grad_shared = theano.function(inp, cost, 
            updates=grads_updates + rgrads_updates, 
            on_unused_input='ignore') 

    # apply actual update with the initial learning rate lr 
    n = 1e-6 
    updates = [(p, p - (lr/(T.sqrt(rg) + n))*g) 
       for p, g, rg in zip(tparams.values(), gshared, hist_gshared)] 

    f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore') 

    return f_grad_shared, f_update 
1

我發現this implementation from Lasagne非常簡潔和可讀。您可以使用它相當多,因爲它是:

for param, grad in zip(params, grads): 
    value = param.get_value(borrow=True) 
    accu = theano.shared(np.zeros(value.shape, dtype=value.dtype), 
         broadcastable=param.broadcastable) 
    accu_new = accu + grad ** 2 
    updates[accu] = accu_new 
    updates[param] = param - (learning_rate * grad/
           T.sqrt(accu_new + epsilon))