當多個GPU用於訓練時，加載預訓練模型失敗

我已經通過
checkpoint = ModelCheckpoint(filepath='weights.hdf5')回調訓練了網絡模型並保存了它的權重和體系結構。在培訓過程中，我使用多個GPU通過調用下面的funtion：
當多個GPU用於訓練時，加載預訓練模型失敗

def make_parallel(model, gpu_count): 
    def get_slice(data, idx, parts): 
     shape = tf.shape(data) 
     size = tf.concat([ shape[:1] // parts, shape[1:] ],axis=0) 
     stride = tf.concat([ shape[:1] // parts, shape[1:]*0 ],axis=0) 
     start = stride * idx 
     return tf.slice(data, start, size) 

    outputs_all = [] 
    for i in range(len(model.outputs)): 
     outputs_all.append([]) 

    #Place a copy of the model on each GPU, each getting a slice of the batch 
    for i in range(gpu_count): 
     with tf.device('/gpu:%d' % i): 
      with tf.name_scope('tower_%d' % i) as scope: 

       inputs = [] 
       #Slice each input into a piece for processing on this GPU 
       for x in model.inputs: 
        input_shape = tuple(x.get_shape().as_list())[1:] 
        slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x) 
        inputs.append(slice_n)     

       outputs = model(inputs) 

       if not isinstance(outputs, list): 
        outputs = [outputs] 

       #Save all the outputs for merging back together later 
       for l in range(len(outputs)): 
        outputs_all[l].append(outputs[l]) 

    # merge outputs on CPU 
    with tf.device('/cpu:0'): 
     merged = [] 
     for outputs in outputs_all: 
      merged.append(merge(outputs, mode='concat', concat_axis=0)) 

     return Model(input=model.inputs, output=merged)

隨着測試代碼：

from keras.models import Model, load_model 
import numpy as np 
import tensorflow as tf 

model = load_model('cpm_log/deneme.hdf5') 

x_test = np.random.randint(0, 255, (1, 368, 368, 3)) 

output = model.predict(x = x_test, batch_size=1) 

print output[4].shape

我得到了下面的錯誤：

Traceback (most recent call last): 
    File "cpm_test.py", line 5, in <module> 
    model = load_model('cpm_log/Jun5_1000/deneme.hdf5') 
    File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 240, in load_model 
    model = model_from_config(model_config, custom_objects=custom_objects) 
    File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 301, in model_from_config 
    return layer_module.deserialize(config, custom_objects=custom_objects) 
    File "/usr/local/lib/python2.7/dist-packages/keras/layers/__init__.py", line 46, in deserialize 
    printable_module_name='layer') 
    File "/usr/local/lib/python2.7/dist-packages/keras/utils/generic_utils.py", line 140, in deserialize_keras_object 
    list(custom_objects.items()))) 
    File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2378, in from_config 
    process_layer(layer_data) 
    File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2373, in process_layer 
    layer(input_tensors[0], **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 578, in __call__ 
    output = self.call(inputs, **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 659, in call 
    return self.function(inputs, **arguments) 
    File "/home/muhammed/DEV_LIBS/developments/mocap/pose_estimation/training/cpm/multi_gpu.py", line 12, in get_slice 
    def get_slice(data, idx, parts): 
NameError: global name 'tf' is not defined

通過檢查錯誤輸出，我決定問題是與並行化代碼。但是，我無法解決問題。

來源

2017-06-05 mkocabas

如果在'get_slice'定義的開始處添加'import tensorflow as tf'會發生什麼？ – michetonu

它已被添加。什麼也沒有變。 – mkocabas

您可能需要使用custom_objects來啓用加載模型。

import tensorflow as tf 
model = load_model('model.h5', custom_objects={'tf': tf,})

來源

2017-11-21 03:40:18

當多個GPU用於訓練時，加載預訓練模型失敗

回答

相關問題