Keras - 學習情感數據後預測模型引發錯誤

我的問題是通過使用tensorflow後端的keras中的預測方法來獲得結果。但首先是一個小介紹。Keras - 學習情感數據後預測模型引發錯誤

我使用

的Python 2.7.12
Keras == 1.2.1
numpy的== 1.12.0
tensorflow == 0.12.1

我在這些文檔中創建了卷積神經網絡：https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html

我用11842準備的twitter文本訓練了網絡。唯一的變化就是我有3種可能性（0,1,2）。我定義的是，在下面的代碼行：

preds = Dense(3, activation='softmax')(x)

因此該方法適合作品沒有問題，我88-92％之間ACC實現。

model_fit = model.fit(x_train, y_train, validation_data=(x_val, y_val), nb_epoch=10, batch_size=128)

學習過程結束後，我將模型保存爲.h5格式（也正常工作）。

現在我嘗試加載模型並預測它們。第一個例子（trained_model）通過我用來訓練的相同數據完成...因爲我想比較它們。第二個例子（trained_model_2）是通過新的twitter文本（我之前收集的）完成的。

trained_model = load_model("trained_model.h5") 
prediction_result = trained_model.predict(data_train, batch_size=128) 
print prediction_result.shape ### Prints: (11842, 3) 

trained_model_2 = load_model("trained_model.h5") 
prediction_result_2 = trained_model_2.predict(data_predict, batch_size=128)

對於訓練數據集的「生活/新的」數據集進行比較：

print data_train.shape # (11842, 1000)

print data_predict.shape # (46962, 1000)

而且兩者都從類型dtype=int32

下面的代碼行提高第一個錯誤：

prediction_result_2 = trained_model_2.predict(data_predict, batch_size=128)

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,999] = 13608 is not in [0, 13480) [[Node: Gather_1 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](embedding_1_W_1/read, _recv_input_1_1_0)]]

下面的代碼行引出了第二個錯誤：

trained_model_2 = load_model("trained_model.h5")

InvalidArgumentError (see above for traceback): indices[0,999] = 13608 is not in [0, 13480) [[Node: Gather_1 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](embedding_1_W_1/read, _recv_input_1_1_0)]]

編輯我創建的方法源代碼。方法「trainModule」僅用於訓練網絡/保存它。「predict_sentiment」方法用於我的預測測試。第一prediction_results工作，並返回以下形狀一個numpy的陣列（11842，3） Code - pastbin

整個錯誤輸出：Error output - pastbin

如果需要一些額外的信息，我會更新的問題...

來源

2017-02-18 HauLuk

你做了什麼樣的轉變得到x_train，從data_train x_val？ – rAyyy

用於學習模型？我在上面的鏈接中做了它。我將所有的文本進行混洗，並將它們與「VALIDATION_SPLIT」（對我來說是0.19）相乘，然後將它們在訓練數據和驗證數據之間分開。或者你的意思是我怎麼讓他們到一個數組？ – HauLuk

問題是訓練的模型無法找到嵌入矩陣中的單詞。這意味着我用了不同的詞彙來訓練和預測。因爲固定的詞彙表我需要相同的詞彙表來處理火車和新的數據。

總的來說，我只有從固定標記生成器：

tokenizer_predict = Tokenizer(nb_words=MAX_NB_WORDS) 
tokenizer_predict.fit_on_texts(texts_predict) 
sequence_predict = tokenizer_predict.texts_to_sequences(predict_data)

要：

tokenizer_predict = Tokenizer(nb_words=MAX_NB_WORDS) 
tokenizer_predict.fit_on_texts(texts_train) 
sequence_predict = tokenizer_predict.texts_to_sequences(predict_data)

來源

2017-02-23 08:23:31 HauLuk

也許它試圖訪問索引13608（顯然不會工作）的列表[0，13480）。其他人也有類似的問題：https://github.com/tensorflow/tensorflow/issues/2734 看來他試圖訪問索引10535的詞彙[0,10000]。

來源

2017-02-18 12:03:31 Cahya

帶有print（len（data_train）），print（type（data_train））我得到以下輸出11842， ...對於訓練數據集的預測。我得到了相同的輸出生活的twitter數據集...只有長度更高46962， ...所以它應該是一個nparray – HauLuk

Keras - 學習情感數據後預測模型引發錯誤

回答

相關問題