2015-11-06 31 views
5

我目前在嘗試使用我的訓練數據來擬合GRU模型時遇到了一個問題。 咋一看StackOverflow上之後,我發現這個職位是非常相似,我的問題:Keras GRU NN擬合時出現KeyError:「不在索引中」

Simplest Lstm training with Keras io

我自己的模型如下:

nn = Sequential() 
nn.add(Embedding(input_size, hidden_size)) 
nn.add(GRU(hidden_size_2, return_sequences=False)) 
nn.add(Dropout(0.2)) 
nn.add(Dense(output_size)) 
nn.add(Activation('linear')) 

nn.compile(loss='mse', optimizer="rmsprop") 

history = History() 
nn.fit(X_train, y_train, batch_size=30, nb_epoch=200, validation_split=0.1, callbacks=[history]) 

和錯誤是:

--------------------------------------------------------------------------- 
KeyError         Traceback (most recent call last) 
<ipython-input-14-e2f199af6e0c> in <module>() 
     1 history = History() 
----> 2 nn.fit(X_train, y_train, batch_size=30, nb_epoch=200, validation_split=0.1, callbacks=[history]) 

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\keras\models.pyc in fit(self, X, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, show_accuracy, class_weight, sample_weight) 
    487       verbose=verbose, callbacks=callbacks, 
    488       val_f=val_f, val_ins=val_ins, 
--> 489       shuffle=shuffle, metrics=metrics) 
    490 
    491  def predict(self, X, batch_size=128, verbose=0): 

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\keras\models.pyc in _fit(self, f, ins, out_labels, batch_size, nb_epoch, verbose, callbacks, val_f, val_ins, shuffle, metrics) 
    199     batch_ids = index_array[batch_start:batch_end] 
    200     try: 
--> 201      ins_batch = slice_X(ins, batch_ids) 
    202     except TypeError as err: 
    203      raise Exception('TypeError while preparing batch. \ 

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\keras\models.pyc in slice_X(X, start, stop) 
    53  if type(X) == list: 
    54   if hasattr(start, '__len__'): 
---> 55    return [x[start] for x in X] 
    56   else: 
    57    return [x[start:stop] for x in X] 

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key) 
    1789   if isinstance(key, (Series, np.ndarray, Index, list)): 
    1790    # either boolean or fancy integer index 
-> 1791    return self._getitem_array(key) 
    1792   elif isinstance(key, DataFrame): 
    1793    return self._getitem_frame(key) 

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key) 
    1833    return self.take(indexer, axis=0, convert=False) 
    1834   else: 
-> 1835    indexer = self.ix._convert_to_indexer(key, axis=1) 
    1836    return self.take(indexer, axis=1, convert=True) 
    1837 

C:\Users\XXXX\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter) 
    1110     mask = check == -1 
    1111     if mask.any(): 
-> 1112      raise KeyError('%s not in index' % objarr[mask]) 
    1113 
    1114     return _values_from_object(indexer) 

KeyError: '[ 61 13980 11357 5577 11500 12125 19673 10985 2480 5237 2519 14874\n 16003 2611 3851 10837 11865 14607 10682 5495 10220 5043 23145 11280\n 9547 4766 18323 730 6263] not in index' 

任何想法解決這個? 由於

編輯:關於數據的一些事實:

data_X = pd.read_csv("X.csv") 
data_Y = pd.read_csv("Y.csv") 

def train_test_split(X,Y, test_size=0.15): 
    # This just splits data to training and testing parts 
    ntrn = int(round(X.shape[0] * (1 - test_size))) 
    perms = np.random.permutation(X.shape[0]) 
    X_train = X.ix[perms[0:ntrn]] 
    Y_train = Y.ix[perms[0:ntrn]] 
    X_test = X.ix[perms[ntrn:]] 
    Y_test = Y.ix[perms[ntrn:]] 

    return (X_train, Y_train), (X_test, Y_test) 

X和Y是CSV文件,其中包含的時間序列值(例如,對於每一行,有時間序列的37個連續的值在X文件在Y文件+ 2個的時間值(認爲是過去的)和30(視爲預測來預測))

print X_train[:1] 
print y_train[:1] 

      0 1 2 3 4 5 6 7 8 9  ...  29 30 31 32 \ 
1629 84 76 76 72 72 72 72 87 87 100  ...  165 165 169 169 

     33 34 35 36   37   38 
1629 166 166 185 185 1236778440 1236789240 

[1 rows x 39 columns] 
     0 1 2 3 4 5 6 7 8 9 ... 20 21 22 \ 
1629 195 195 195 195 196 196 194 194 192 192 ... 182 182 164 

     23 24 25 26 27 28 29 
1629 164 146 146 128 128 103 103 

[1 rows x 30 columns] 
+0

「X_train」和「y_train」的類型是什麼? –

+0

都是'' – Julian

+0

我編輯帖子以添加一些關於數據的信息,以防萬一它可能有幫助... – Julian

回答

19

我不能使用熊貓DataFrames作爲輸入&輸出到Keras model.fit,至少不是熊貓0.13.1,這是t他來自Ubuntu的標準軟件包。

而是使用np.array(X_train)和np.array(Y_train)。這對我有效。

+0

非常簡單的解決方法:) –

+0

節儉但令人驚歎的解決方案! –

1

我遇到過類似的問題。在我的情況下,問題在於在輸入中使用了預定義尺寸的嵌入圖層,所以傳遞到此圖層的序列應該使用keras.preprocessing.sequence填充或截斷到input_size。