2017-05-15 13 views
3

我試圖訓練使用CNTK的模型,該模型需要兩個輸入序列並輸出2-d標量標籤。我已經這樣定義模型:CNTK序列模型錯誤:檢測到不同的小批量佈局

def create_seq_model(num_tokens): 
    with C.default_options(init=C.glorot_uniform()): 
     i1 = sequence.input(shape=num_tokens, is_sparse=True, name='i1') 
     i2 = sequence.input(shape=num_tokens, is_sparse=True, name='i2') 
     s1 = Sequential([Embedding(300), Fold(GRU(64))])(i1) 
     s2 = Sequential([Embedding(300), Fold(GRU(64))])(i2) 
     combined = splice(s1, s2) 
     model = Sequential([Dense(64, activation=sigmoid), 
         Dropout(0.1, seed=42), 
         Dense(2, activation=softmax)]) 
     return model(combined) 

我已經將我的數據轉換爲CTF格式。當我嘗試使用下面的代碼片斷訓練它(非常輕輕地從例如here修改),我得到一個錯誤:

def train(reader, model, max_epochs=16): 
    criterion = create_criterion_function(model) 

    criterion.replace_placeholders({criterion.placeholders[0]: C.input(2, name='labels')}) 

    epoch_size = 500000 
    minibatch_size=128 

    lr_per_sample = [0.003]*4+[0.0015]*24+[0.0003] 
    lr_per_minibatch= [x*minibatch_size for x in lr_per_sample] 
    lr_schedule = learning_rate_schedule(lr_per_minibatch, UnitType.minibatch, epoch_size) 

    momentum_as_time_constant = momentum_as_time_constant_schedule(700) 

    learner = fsadagrad(criterion.parameters, 
        lr=lr_schedule, momentum=momentum_as_time_constant, 
        gradient_clipping_threshold_per_sample=15, 
        gradient_clipping_with_truncation=True) 

    progress_printer = ProgressPrinter(freq=1000, first=10, tag='Training', num_epochs=max_epochs) 

    trainer = Trainer(model, criterion, learner, progress_printer) 

    log_number_of_parameters(model) 

    t = 0 
    for epoch in range(max_epochs): 
     epoch_end = (epoch+1) * epoch_size 
     while(t < epoch_end): 
      data = reader.next_minibatch(minibatch_size, input_map={ 
       criterion.arguments[0]: reader.streams.i1, 
       criterion.arguments[1]: reader.streams.i2, 
       criterion.arguments[2]: reader.streams.labels 
      }) 
      trainer.train_minibatch(data) 
      t += data[criterion.arguments[1]].num_samples 
     trainer.summarize_training_progress() 

的錯誤是這樣的:

Different minibatch layouts detected (difference in sequence lengths or count or start flags) in data specified for the Function's arguments 'Input('i2', [#, *], [132033])' vs. 'Input('i1', [#, *], [132033])', though these arguments have the same dynamic axes '[*, #]' 

我注意到,如果我選擇兩個輸入序列長度相同的示例,則訓練函數起作用。不幸的是,這代表了非常少量的數據。處理具有不同數據長度的序列的正確機制是什麼?我是否需要填充輸入(類似於Keras的pad_sequence())?

回答

6

將兩個序列i1i2偶然地視爲具有相同的長度。這是因爲sequence.input(...)的參數sequence_axis的默認值爲default_dynamic_axis()。解決這個問題的一種方法是告訴CNTK,通過賦予每個獨特的序列軸,這兩個序列不具有相同的長度,如下所示:

i1_axis = C.Axis.new_unique_dynamic_axis('1') 
i2_axis = C.Axis.new_unique_dynamic_axis('2') 
i1 = sequence.input(shape=num_tokens, is_sparse=True, sequence_axis=i1_axis, name='i1') 
i2 = sequence.input(shape=num_tokens, is_sparse=True, sequence_axis=i2_axis, name='i2')