培訓部分的詞性標註器在opennlp

我努力訓練這將根據我的具體vocabulary.for例如，在一個句子標記的話opennlp POS惡搞：培訓部分的詞性標註器在opennlp

正常詞性標註後：

一句話：節點管理器/ NNP未能/ VBD到/啓動/ VB中/ DT服務器/ NN

使用我的詞性標註的模型後：

一句：節點管理器/代理未能/其他與/其他啓動/其他/ OTHER服務器/ OBJECT

其中AGENT，OTHER，OBJECT是我定義的標籤。

所以基本上我正在定義我自己的標籤dictionary.And希望POS tagger使用我的模型。

聞一Apache文檔中檢查這樣

我發現下面的代碼

POSModel model = null; 

InputStream dataIn = null; 
try { 
    dataIn = new FileInputStream("en-pos.train"); 
    ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8"); 
    ObjectStream<POSSample> sampleStream = new WordTagSampleStream(lineStream); 

    model = POSTaggerME.train("en", sampleStream, TrainingParameters.defaultParams(), null, null); 
} 
catch(IOException e) 
{ 
    e.printStackTrace(); 
} 
finally { 
    if (dataIn != null) { 
    try { 
     dataIn.close(); 
    } 
    catch (IOException e) { 
     // Not an issue, training already finished. 
     // The exception should be logged and investigated 
     // if part of a production system. 
     e.printStackTrace(); 
    } 
    } 
}

這裏的時候，他們打開的FileInputStream爲en-pos.train，我想這EN-POS .train是一個.bin文件，與以前使用過的所有文件一樣，只是它是自定義的。有人能告訴我如何獲取.bin文件嗎？

或en-pos.train在哪裏？究竟是什麼？如何創建它？

我提取的bin文件針鋒相對，他們通常使用

EN-POS-maxent.bin。它有我們定義標籤字典，模型文件和屬性文件的xml文件。我已經根據自己的需要更改了它們，但是我的問題是從內容中生成.bin文件。

來源

2013-10-22 yash6