用法的詞語單數爲Keras Tokenizer

在Keras中使用單詞的正克是否是真的？用法的詞語單數爲Keras Tokenizer

例如，句子列表在X_train數據框中包含「句子」列。我在接下來的方式標記生成器使用來自Keras：

tokenizer = Tokenizer(lower=True, split=' ') 
tokenizer.fit_on_texts(X_train.sentences) 
X_train_tokenized = tokenizer.texts_to_sequences(X_train.sentences)

後來我使用填充：

X_train_sequence = sequence.pad_sequences(X_train_tokenized)

此外，我使用簡單的LSTM網絡：

model = Sequential() 
model.add(Embedding(MAX_FEATURES, 128)) 
model.add(LSTM(32, dropout=0.2, recurrent_dropout=0.2, 
       activation='tanh', return_sequences=True)) 
model.add(LSTM(64, dropout=0.2, recurrent_dropout=0.2, activation='tanh')) 
model.add(Dense(number_classes, activation='sigmoid')) 
model.compile(loss='categorical_crossentropy', optimizer = 'rmsprop', 
       metrics=['accuracy'])

在這種情況下，標記生成器執行。在凱拉斯文檔：https://keras.io/preprocessing/text/ 我只看到字符處理，但它是nt apprepriate我的情況。

我的主要問題：我可以使用n-gram來處理NLP任務（不需要輿情分析，任何抽象的NLP任務）。

澄清：我想考慮的不僅僅是單詞，還有單詞的組合 - 我想爲我的任務嘗試它。

來源

2017-09-12 Simplex

不幸的是，Keras Tokenizer（）不支持n-gram。您應該創建一個解決方法並自行標記文檔，然後將它們提供給神經網絡。

來源

2017-10-02 08:03:05 Alex

用法的詞語單數爲Keras Tokenizer

回答

相關問題