如何在CoreNLP分割後獲得一個句子的原始文本？

CoreNLP的Tokenization更改句子文本。將由空格分隔的令牌拼接在一起不是真正的重建。如果句子包含圓括號和其他標點符號，情況會變得複雜。請參閱下面的代碼塊。如何在CoreNLP分割後獲得一個句子的原始文本？

Properties props = new Properties(); 
props.setProperty("annotators", "tokenize, ssplit"); 
pipeline = new StanfordCoreNLP(props); 

Annotation document = new Annotation(paragraph); 
pipeline.annotate(document); 

List<CoreMap>sentences = document.get(SentencesAnnotation.class); 

List<String> sentenceList = new ArrayList<>(); 
for (CoreMap sentence : sentences) 
{ 
    //How to get the original text of sentence? 
}

來源

2015-09-08 Chaitanya Shivade

回答我自己的問題。它很容易。在問題代碼塊中插入以下行代替註釋。

String sentenceString = Sentence.listToOriginalTextString(sentence.get(TokensAnnotation.class));

來源

2015-09-08 20:23:38

for (CoreMap sentence : sentences) 
{ 
    String sentenceStr = sentence.get(CoreAnnotations.TextAnnotation.class) 
}

來源

2016-09-19 19:36:00

如何在CoreNLP分割後獲得一個句子的原始文本？

回答

相關問題