CoreNLP提取標記的跨度

我想提取文本的標記化String的跨度。使用斯坦福大學的CoreNLP，我有：CoreNLP提取標記的跨度

Properties props; 
props = new Properties(); 
props.put("annotators", "tokenize, ssplit, pos, lemma"); 
this.pipeline = new StanfordCoreNLP(props); 

String answerText = "This is the answer"; 
ArrayList<IntPair> tokenSpans = new ArrayList<IntPair>(); 
// create an empty Annotation with just the given text 
Annotation document = new Annotation(answerText); 
// run all Annotators on this text 
this.pipeline.annotate(document); 

// Iterate over all of the sentences 
List<CoreMap> sentences = document.get(SentencesAnnotation.class); 
for(CoreMap sentence: sentences) { 
    // Iterate over all tokens in a sentence 
    for (CoreLabel fullToken: sentence.get(TokensAnnotation.class)) { 
     IntPair span = fullToken.get(SpanAnnotation.class); 
     tokenSpans.add(span); 
    } 
}

然而，所有的IntPairs的是null。我是否需要在該行再添annotator：

props.put("annotators", "tokenize, ssplit, pos, lemma");

所需的輸出：

(0,3), (5,6), (8,10), (12,17)

來源

2013-12-14 Adam_G

的問題是使用SpanAnnotation，它適用於Trees。此查詢的正確等級是CharacterOffsetBeginAnnotation和CharacterOffsetEndAnnotation

來源

2013-12-15 00:14:56

CoreNLP提取標記的跨度

回答

相關問題