我正在運行基於IMDB example的BLSTM,但我的版本不是分類,而是標籤的序列預測。爲了簡單起見,您可以將其視爲POS標記模型。輸入是單詞的句子,輸出是標籤。該示例中使用的語法在語法上與大多數其他Keras示例略有不同,因爲它不使用model.add
,但啓動序列。我無法弄清楚如何在這個略有不同的語法中添加一個遮罩層。掩蔽keras BLSTM
我已經運行模型並進行了測試,它工作正常,但它預測和評估0的準確性,這是我的填充。下面的代碼:
from __future__ import print_function
import numpy as np
from keras.preprocessing import sequence
from keras.models import Model
from keras.layers.core import Masking
from keras.layers import TimeDistributed, Dense
from keras.layers import Dropout, Embedding, LSTM, Input, merge
from prep_nn import prep_scan
from keras.utils import np_utils, generic_utils
np.random.seed(1337) # for reproducibility
nb_words = 20000 # max. size of vocab
nb_classes = 10 # number of labels
hidden = 500 # 500 gives best results so far
batch_size = 10 # create and update net after 10 lines
val_split = .1
epochs = 15
# input for X is multi-dimensional numpy array with IDs,
# one line per array. input y is multi-dimensional numpy array with
# binary arrays for each value of each label.
# maxlen is length of longest line
print('Loading data...')
(X_train, y_train), (X_test, y_test) = prep_scan(
nb_words=nb_words, test_len=75)
print(len(X_train), 'train sequences')
print(int(len(X_train)*val_split), 'validation sequences')
print(len(X_test), 'heldout sequences')
# this is the placeholder tensor for the input sequences
sequence = Input(shape=(maxlen,), dtype='int32')
# this embedding layer will transform the sequences of integers
# into vectors
embedded = Embedding(nb_words, output_dim=hidden,
input_length=maxlen)(sequence)
# apply forwards LSTM
forwards = LSTM(output_dim=hidden, return_sequences=True)(embedded)
# apply backwards LSTM
backwards = LSTM(output_dim=hidden, return_sequences=True,
go_backwards=True)(embedded)
# concatenate the outputs of the 2 LSTMs
merged = merge([forwards, backwards], mode='concat', concat_axis=-1)
after_dp = Dropout(0.15)(merged)
# TimeDistributed for sequence
# change activation to sigmoid?
output = TimeDistributed(
Dense(output_dim=nb_classes,
activation='softmax'))(after_dp)
model = Model(input=sequence, output=output)
# try using different optimizers and different optimizer configs
# loss=binary_crossentropy, optimizer=rmsprop
model.compile(loss='categorical_crossentropy',
metrics=['accuracy'], optimizer='adam')
print('Train...')
model.fit(X_train, y_train,
batch_size=batch_size,
nb_epoch=epochs,
shuffle=True,
validation_split=val_split)
UPDATE:
我歸入本PR並得到了它在嵌入層mask_zero=True
工作。但我現在意識到模型的糟糕表現後,我還需要在輸出中進行掩蓋,其他人建議使用sample_weight代替model.fit
這一行。我怎麼能這樣做忽略0?
更新2:
所以我讀this並想出sample_weight
爲1和0的矩陣。我認爲它可能一直在工作,但我的準確度大約在50%左右,我發現它試圖預測填充的部分,但不會像現在使用sample_weight之前的問題那樣預測它們爲0。
當前代碼:
from __future__ import print_function
import numpy as np
from keras.preprocessing import sequence
from keras.models import Model
from keras.layers.core import Masking
from keras.layers import TimeDistributed, Dense
from keras.layers import Dropout, Embedding, LSTM, Input, merge
from prep_nn import prep_scan
from keras.utils import np_utils, generic_utils
import itertools
from itertools import chain
from sklearn.preprocessing import LabelBinarizer
import sklearn
import pandas as pd
np.random.seed(1337) # for reproducibility
nb_words = 20000 # max. size of vocab
nb_classes = 10 # number of labels
hidden = 500 # 500 gives best results so far
batch_size = 10 # create and update net after 10 lines
val_split = .1
epochs = 10
# input for X is multi-dimensional numpy array with syll IDs,
# one line per array. input y is multi-dimensional numpy array with
# binary arrays for each value of each label.
# maxlen is length of longest line
print('Loading data...')
(X_train, y_train), (X_test, y_test), maxlen, sylls_ids, tags_ids, weights = prep_scan(nb_words=nb_words, test_len=75)
print(len(X_train), 'train sequences')
print(int(len(X_train) * val_split), 'validation sequences')
print(len(X_test), 'heldout sequences')
# this is the placeholder tensor for the input sequences
sequence = Input(shape=(maxlen,), dtype='int32')
# this embedding layer will transform the sequences of integers
# into vectors of size 256
embedded = Embedding(nb_words, output_dim=hidden,
input_length=maxlen, mask_zero=True)(sequence)
# apply forwards LSTM
forwards = LSTM(output_dim=hidden, return_sequences=True)(embedded)
# apply backwards LSTM
backwards = LSTM(output_dim=hidden, return_sequences=True,
go_backwards=True)(embedded)
# concatenate the outputs of the 2 LSTMs
merged = merge([forwards, backwards], mode='concat', concat_axis=-1)
# after_dp = Dropout(0.)(merged)
# TimeDistributed for sequence
# change activation to sigmoid?
output = TimeDistributed(
Dense(output_dim=nb_classes,
activation='softmax'))(merged)
model = Model(input=sequence, output=output)
# try using different optimizers and different optimizer configs
# loss=binary_crossentropy, optimizer=rmsprop
model.compile(loss='categorical_crossentropy',
metrics=['accuracy'], optimizer='adam',
sample_weight_mode='temporal')
print('Train...')
model.fit(X_train, y_train,
batch_size=batch_size,
nb_epoch=epochs,
shuffle=True,
validation_split=val_split,
sample_weight=weights)
這是一個古老的問題,但你解決了這個問題嗎?我在同一階段...我發現[精度不考慮'sample_weight'](https://github.com/fchollet/keras/issues/1642),根據我的測試,既不掩蔽(實際上使用掩蔽會產生不同的準確度值,以至於我還無法解決)。我最終可能會使用功能性API來建立第二個輸出並且精確。 – jdehesa
重溫這個問題並簡化它關於目前的Keras代碼將不勝感激。 – Seanny123