CRFClassifier無法識別語句分隔符選項

我正在使用CoreNLP在多行英文文本中註釋NE。當做如下：CRFClassifier無法識別語句分隔符選項

Properties props = new Properties(); 
props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); 
props.put("ssplit.newlineIsSentenceBreak", "always"); 
StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
String contentStr = "John speaks with Martin\n\nJeremy talks to him too."; 
Annotation document 
= new Annotation(contentStr); 
pipeline.annotate(document); 
List<CoreMap> sents = document.get(SentencesAnnotation.class); 
for (int i = 0; i < sents.size(); i++) { 
    System.out.println("sentence " + i + " "+ sents.get(i)); 
}

句子拆分工作正常，承認兩句話。然而，當我使用NER分類如下：

CRFClassifier classifier = CRFClassifier.getClassifier("edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz", props); 
String classifiedStr = classifier.classifyWithInlineXML(contentStr);

我收到以下錯誤信息：

Unknown property: |ssplit.newlineIsSentenceBreak| Unknown property: |annotators|

和分類似乎認爲所有文本作爲一個句子產生誤認的實體「馬丁傑里米」，而不是兩個不同的實體。

任何想法有什麼不對？

來源

2015-10-16 Bahaa

CRFClassifier.getClassifier所採用的屬性與構造函數StanfordCoreNLP所採用的屬性不同，這就是爲什麼會出現該選項未知的錯誤。

它將被設置，但它不會在運行時使用。

從here，你會發現你需要設置SeqClassifierFlags的屬性。您需要設置tokenizerOptions，並將該選項設置爲"tokenizeNLs = true"，該選項將新行視爲令牌。

底線，設置屬性如下，獲取分類器之前。它不應該給你未知財產的錯誤，它應該按預期工作。

Properties props = new Properties(); 
props.put("tokenizerOptions", "tokenizeNLs=true"); 

CRFClassifier classifier = CRFClassifier.getClassifier("edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz", props); 
String classifiedStr = classifier.classifyWithInlineXML(contentStr);

來源

2015-10-16 20:46:33

Thanks @Mohamed Selim。這只是答案！ – Bahaa

CRFClassifier無法識別語句分隔符選項

回答

相關問題