2016-04-04 114 views
-3

我正在嘗試實施CNN來玩遊戲。 我使用python與theano /烤寬麪條。我已經建立了網絡,現在正在弄清楚如何訓練它。卷積神經網絡:如何訓練它? (無人監督)

所以現在我有一個批次的32個州並在該批次中的動作每個狀態與該動作的預期回報

現在我該如何訓練網絡,以便它瞭解到在這些州的這些行爲會帶來這些回報?

編輯:澄清我的問題。

這裏是我的全碼:http://pastebin.com/zY8w98Ng 蛇進口:http://pastebin.com/fgGCabzR

我有這個麻煩一點:

def _train(self): 
    # Prepare Theano variables for inputs and targets 
    input_var = T.tensor4('inputs') 
    target_var = T.ivector('targets') 
    states = T.tensor4('states') 
    print "sampling mini batch..." 
    # sample a mini_batch to train on 
    mini_batch = random.sample(self._observations, self.MINI_BATCH_SIZE) 
    # get the batch variables 
    previous_states = [d[self.OBS_LAST_STATE_INDEX] for d in mini_batch] 
    actions = [d[self.OBS_ACTION_INDEX] for d in mini_batch] 
    rewards = [d[self.OBS_REWARD_INDEX] for d in mini_batch] 
    current_states = np.array([d[self.OBS_CURRENT_STATE_INDEX] for d in mini_batch]) 
    agents_expected_reward = [] 
    # print np.rollaxis(current_states, 3, 1).shape 
    print "compiling current states..." 
    current_states = np.rollaxis(current_states, 3, 1) 
    current_states = theano.compile.sharedvalue.shared(current_states) 

    print "getting network output from current states..." 
    agents_reward_per_action = lasagne.layers.get_output(self._output_layer, current_states) 


    print "rewards adding..." 
    for i in range(len(mini_batch)): 
     if mini_batch[i][self.OBS_TERMINAL_INDEX]: 
      # this was a terminal frame so need so scale future reward... 
      agents_expected_reward.append(rewards[i]) 
     else: 
      agents_expected_reward.append(
       rewards[i] + self.FUTURE_REWARD_DISCOUNT * np.max(agents_reward_per_action[i].eval())) 

    # figure out how to train the model (self._output_layer) with previous_states, 
    # actions and agent_expected_rewards 

我想用previous_states,行動和agent_expected_rewards所以更新模型它知道這些行爲會帶來這些回報。

我希望它會是這個樣子:

train_model = theano.function(inputs=[input_var], 
    outputs=self._output_layer, 
    givens={ 
     states: previous_states, 
     rewards: agents_expected_reward 
     expected_rewards: agents_expected_reward) 

我只是不明白的吉文斯會如何影響模型建設網絡的時候,因爲我不指定它們。我無法在theano和烤寬麪條文檔中找到它。

那麼,我該如何更新模型/網絡,以便「學習」。

如果還不清楚,請留言什麼信息仍然需要。我一直試圖弄清楚這幾天。

回答

1

閱讀完文檔後,我終於找到了答案。我以前在錯誤的地方看過。

network = self._output_layer 
    prediction = lasagne.layers.get_output(network) 
    loss = lasagne.objectives.categorical_crossentropy(prediction, target_var) 
    loss = loss.mean() 

    params = lasagne.layers.get_all_params(network, trainable=True) 
    updates = lasagne.updates.sgd(loss, params, self.LEARN_RATE) 
    givens = { 
     states: current_states, 
     expected: agents_expected_reward, 
     real_rewards: rewards 
    } 
    train_fn = theano.function([input_var, target_var], loss, 
            updates=updates, on_unused_input='warn', 
            givens=givens, 
            allow_input_downcast='True') 
    train_fn(current_states, agents_expected_reward)