1
CoreNLP的Tokenization更改句子文本。將由空格分隔的令牌拼接在一起不是真正的重建。如果句子包含圓括號和其他標點符號,情況會變得複雜。請參閱下面的代碼塊。如何在CoreNLP分割後獲得一個句子的原始文本?
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit");
pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation(paragraph);
pipeline.annotate(document);
List<CoreMap>sentences = document.get(SentencesAnnotation.class);
List<String> sentenceList = new ArrayList<>();
for (CoreMap sentence : sentences)
{
//How to get the original text of sentence?
}