如何說服分詞器使用單個句子正確工作

我有以下句子（只有一個），當我使用下面的代碼進行分詞和知道每個詞的索引時，分詞器認爲它像兩個句子，因爲「大約」後的全站。我該如何解決這個問題：如何說服分詞器使用單個句子正確工作

String sentence = "09-Aug-2003 -- On Saturday, 9th August 2003, Daniel and I start with our Enduros approx. 100 kilometers from the confluence point." 

Annotation document = new Annotation(sentence); 
pipeline.annotate(document); 
for (CoreLabel token : document.get(CoreAnnotations.TokensAnnotation.class)) { 
    String word = token.get(CoreAnnotations.TextAnnotation.class); 
    System.out.println(token.index(), word); 
}

例如的「公里」真正的指數是20。但根據這個代碼是2

來源

2015-04-06 M A

如果您添加以下屬性對象，您在傳遞給pipeline

Properties props = new Properties(); 
props.setProperty("annotators", "tokenize, ssplit"); 
props.setProperty("ssplit.isOneSentence", "true"); 

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

那麼就不會分裂將文字分成不同的句子。

（在此頁搜索'ssplit'查看所有其他選項http://nlp.stanford.edu/software/corenlp.shtml）

來源

2015-04-08 01:05:54

如何說服分詞器使用單個句子正確工作

回答

相關問題