2017-05-28 67 views
0

在這個週末我嘗試構建一個神經網絡,它使用進化算法進行改進。我在openai的Cartpole環境(https://www.openai.com/)中運行了5000代,但效果並不理想。神經網絡有4個輸入,1個隱藏層,3個單元,1個輸出,網絡使用tanH作爲激活函數。每代有100個人,其中5人被選爲下一代,有20%的機會發生突變。下面是更好地理解代碼:演化算法沒有改進

import operator 
import gym 
import math 
import random 
import numpy 
import matplotlib.pyplot as plt 

env = gym.make('CartPole-v0') 

generations = 100 
input_units = 4 
Hidden_units = 3 
output_units = 1 
individuals = 100 

fitest1 = [] 
fitest2 = [] 

def Neural_Network(x, weights1, weights2): 
    global output 
    output = list(map(operator.mul, x, weights1)) 
    output = numpy.tanh(output) 
    output = list(map(operator.mul, output, weights2)) 
    output = sum(output) 
    return(output) 

weights1 = [[random.random() for i in range(input_units*Hidden_units)] for j in range(individuals)] 
weights2 = [[random.random() for i in range(Hidden_units*output_units)] for j in range(individuals)] 

fit_plot = [] 

for g in range(generations): 
    print('generation:',g+1) 
    fitness=[0 for f in range(individuals)] 
    prev_obs = [] 
    observation = env.reset() 
    for w in weights1: 
     print('  individual ',weights1.index(w)+1, ' of ', len(weights1)) 
     env.reset() 
     for t in range(500): 
      #env.render() 
      Neural_Network(observation, weights1[weights1.index(w)], weights2[weights1.index(w)]) 
      action = output < 0.5 
      observation, reward, done, info = env.step(action) 
      fitness[weights1.index(w)]+=reward 
      if done: 
       break 
     print('  individual fitness:', fitness[weights1.index(w)]) 
    print('min fitness:', min(fitness)) 
    print('max fitness:', max(fitness)) 
    print('average fitness:', sum(fitness)/len(fitness)) 
    fit_plot.append(sum(fitness)/len(fitness)) 
    for f in range(10): 
     fitest1.append(weights1[fitness.index(max(fitness))]) 
     fitest2.append(weights2[fitness.index(max(fitness))]) 
     fitness[fitness.index(max(fitness))] = -1000000000 


    for x in range(len(weights1)): 
     for y in range(len(weights1[x])): 
      weights1[x][y]=random.choice(fitest1)[y] 
      if random.randint(1,5) == 1: 
       weights1[random.randint(0, len(weights1)-1)][random.randint(0, len(weights1[0])-1)] += random.choice([0.1, -0.1]) 

    for x in range(len(weights2)): 
     for y in range(len(weights2[x])): 
      weights2[x][y]=random.choice(fitest2)[y] 
      if random.randint(1,5) == 1: 
       weights1[random.randint(0, len(weights1)-1)][random.randint(0, len(weights1[0])-1)] += random.choice([0.1, -0.1]) 

plt.axis([0,generations,0,100]) 
plt.ylabel('fitness') 
plt.xlabel('generations') 
plt.plot(range(0,generations), fit_plot) 
plt.show() 

env.reset() 
for t in range(100): 
    env.render() 
    Neural_Network(observation, fitest1[0], fitest2[0]) 
    action = output < 0.5 
    observation, reward, done, info = env.step(action) 
    if done: 
     break 

如果有人想知道,一般的健身在幾代人的圖形(我只跑了這個時候100代)As you can see, the algorithm is not improving

如果還有任何問題,問問。

+0

你如何選擇個人?你如何得到關閉春天的重量? – blckbird

+0

看53到69行。 –

+0

我並不擅長python,所以我不能真正幫忙,但看看[Neataptic](https://github.com/wagenaartje/neataptic)的源代碼 - 也許你會發現問題。 –

回答

0

我的觀點是,在進化算法中,你並沒有在EA結束時選擇正確的個體。確保你選擇了最好的2個人(可以只與一個人工作,但我們希望比那個更好:))。這應該會改善所期望的結果:)