對CoreNLP使用ssplit選項

根據文檔，我可以使用諸如ssplit.isOneSentence之類的選項來將我的文檔解析爲句子。鑑於StanfordCoreNLP對象，我究竟該怎麼做？對CoreNLP使用ssplit選項

這裏是我的代碼 -

Properties props = new Properties(); 
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, depparse"); 
pipeline.annotate(document); 
Annotation document = new Annotation(doc); 
pipeline.annotate(document); 
List<CoreMap> sentences = document.get(SentencesAnnotation.class);

在哪一點我添加此選項，在哪裏？這樣的事情？

pipeline.ssplit.boundaryTokenRegex = '"'

我也想知道如何使用它的特定選項boundaryTokenRegex

編輯：

我覺得這似乎更合適 -

props.put("ssplit.boundaryTokenRegex", "/"");

但我仍然必須驗證。

來源

2016-07-16 Nikhil Prabhu

的方式做到這一點的令牌化的句子在一個「'任何實例結束是這樣的 - 。

props.setProperty("ssplit.boundaryMultiTokenRegex", "/\'\'/");

或

props.setProperty("ssplit.boundaryMultiTokenRegex", "/\"/");

取決於它是如何被存儲（CoreNLP規格化它作爲前）

如果你想同時開始和結束的報價 -

props.setProperty("ssplit.boundaryMultiTokenRegex","\/'/'|``\");

來源

2016-07-16 20:18:23

對CoreNLP使用ssplit選項

回答

相關問題