2013-10-25 103 views
0

我想訓練自己的自定義模型。 從哪裏可以開始?如何訓練自定義模型opeennlp?

我使用這個樣本數據來訓練模型:

<START:meaningless>Took connection and<END> selected the Text in the Letter Template and cleared the Formatting of Text to Normal. 

基本上我希望找出從給定輸入一些毫無意義的文字。

我試着用opennlp開發文檔給出的以下示例代碼 但是出現錯誤:模型與name finder不兼容!

Charset charset = Charset.forName("UTF-8"); 
ObjectStream<String> lineStream = 
     new PlainTextByLineStream(new FileInputStream("mynewmodel.train"), charset); 
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream); 

TokenNameFinderModel model; 

try { 
    model = NameFinderME.train("en", "meaningless", sampleStream, 
     Collections.<String, Object>emptyMap(), 100, 5); 
} 
finally { 
    sampleStream.close(); 
} 

try { 
    modelOut = new BufferedOutputStream(new FileOutputStream(modelFile)); 
    model.serialize(modelOut); 
} finally { 
    if (modelOut != null) 
    modelOut.close();  
} 
+0

一個問題,什麼樣的文件是 「mynewmodel.train」? –

回答

0

可能存在的問題:您沒有提供明確標記文本的培訓師。 PlainTextByLineStream,如果我正確理解文檔,需要空格分隔的令牌。所以

<START:meaningless> Took connection and <END> 

而不是

<START:meaningless>Took connection and<END>