在opennlp中訓練自己的模型

我發現很難創建我自己的模型openNLP。任何人都可以告訴我，如何擁有模型。訓練如何完成。在opennlp中訓練自己的模型

什麼應該是輸入和輸出模型文件的存儲位置。

來源

2012-06-26 user1482228

對於哪種工具是你創建一個模型？ – wcolen

也許這篇文章會幫助你。它描述瞭如何做TokenNameFinder從維基百科中提取數據訓練......

nuxeo - blog - Mining Wikipedia with Hadoop and Pig for Natural Language Processing

來源

2012-06-27 14:16:12

該鏈接不再工作。 – Ruthwik

@Ruthwik感謝您的評論。鏈接已更新。 –

https://opennlp.apache.org/docs/1.5.3/manual/opennlp.html

這個網站是非常有用的，同時顯示代碼，並使用OpenNLP應用於訓練所有不同類型的模型，如實體提取和詞類等。

我可以給你索姆e代碼示例在這裏，但該頁面使用非常清晰。

理論明智：

基本上你創建列出你想訓練

例如東西的文件。

體育[空格]這是一個關於足球，橄欖球和東西

政治[空格]這是一個關於布萊爾當首相，一頁一頁。

該格式在上面的頁面中進行了描述（每個模型需要不同的格式）。一旦你創建了這個文件，你就可以通過API或opennlp應用程序（通過命令行）運行它，並生成一個.bin文件。一旦你有這個.bin文件，你可以將它加載到模型中，並開始使用它（按照上面的網站中的api）。

來源

2013-10-28 15:57:38

或者可以說RTFM爲自己節省一些打字。 – demongolem

讓我告訴你最新的文檔http://opennlp.apache.org/docs/1.8.1/manual/opennlp.html –

首先您需要使用所需的實體來訓練數據。

句子應該用換行符分隔（\ n）。值應該與空格字符分隔。
比方說，你要創建醫藥實體模型，這樣的數據應該是這樣的：

<START:medicine> Augmentin-Duo <END> is a penicillin antibiotic that contains two medicines - <START:medicine> amoxicillin trihydrate <END> and 
<START:medicine> potassium clavulanate <END>. They work together to kill certain types of bacteria and are used to treat certain types of bacterial infections.

你可以參考的樣本dataset例如。訓練數據應至少有15000個句子才能獲得更好的結果。

此外，您可以使用Opennlp TokenNameFinderTrainer。輸出文件將採用.bin格式。

這裏是例子：Writing a custom NameFinder model in OpenNLP

欲瞭解更多詳情，請參照Opennlp documentation

來源

2016-06-08 07:27:13

將數據複製數據並運行下面的代碼，以獲得自己的mymodel.bin。

可以參考的數據= https://github.com/mccraigmccraig/opennlp/blob/master/src/test/resources/opennlp/tools/namefind/AnnotatedSentencesWithTypes.txt

public class Training { 
     static String onlpModelPath = "mymodel.bin"; 
     // training data set 
     static String trainingDataFilePath = "data.txt"; 

     public static void main(String[] args) throws IOException { 
         Charset charset = Charset.forName("UTF-8"); 
         ObjectStream<String> lineStream = new PlainTextByLineStream(
                 new FileInputStream(trainingDataFilePath), charset); 
         ObjectStream<NameSample> sampleStream = new NameSampleDataStream(
                 lineStream); 
         TokenNameFinderModel model = null; 
         HashMap<String, Object> mp = new HashMap<String, Object>(); 
         try { 
           //   model = NameFinderME.train("en","drugs", sampleStream, Collections.<String,Object>emptyMap(),100,4) ; 
             model= NameFinderME.train("en", "drugs", sampleStream, Collections. emptyMap()); 
         } finally { 
             sampleStream.close(); 
         } 
         BufferedOutputStream modelOut = null; 
         try { 
             modelOut = new BufferedOutputStream(new FileOutputStream(onlpModelPath)); 
             model.serialize(modelOut); 
         } finally { 
             if (modelOut != null) 
                 modelOut.close(); 
         } 
     } 
}

來源

2016-09-21 13:33:28 user6858643

歡迎來到Stack Overflow！雖然這段代碼可能有助於解決這個問題，但它並沒有解釋_why_和/或_how_它是如何回答這個問題的。提供這種附加背景將顯着提高其長期教育價值。請[編輯]您的答案以添加解釋，包括適用的限制和假設。 –

在opennlp中訓練自己的模型

回答

相關問題