斯坦福數字名稱實體識別

我有一個問題，我試圖從使用斯坦福的文本中識別數字名稱實體，萬一我有例如2000萬這是檢索像這樣的「數字」：[「20 -5「，」million-6「]，我怎樣才能優化答案，所以兩千萬人聚在一起？以及如何在上例中忽略像（5,6）那樣的索引號？我正在使用java語言。斯坦福數字名稱實體識別

public void extractNumbers(String text) throws IOException { 
    number = new HashMap<String, ArrayList<String>>(); 
    n= new ArrayList<String>(); 
    edu.stanford.nlp.pipeline.Annotation document = new edu.stanford.nlp.pipeline.Annotation(text); 
    pipeline.annotate(document); 
    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class); 
    for (CoreMap sentence : sentences) { 
     for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) { 

      if (!token.get(CoreAnnotations.NamedEntityTagAnnotation.class).equals("O")) { 

       if (token.get(CoreAnnotations.NamedEntityTagAnnotation.class).equals("NUMBER")) { 
        n.add(token.toString()); 
     number.put("Number",n); 
       } 
      } 

     } 

    }

來源

2017-05-06 Rtrut Kuhfd

您可能想擴大一點。你使用了哪個模型？你在用什麼語言？另外一個代碼片段可以幫助我們確切知道你做了什麼。 – entrophy

@entrophy我編輯了這個問題:) –

這裏哪個類的對象是'pipeline'。正如你在使用斯坦福管道。 – entrophy

要想從CoreLabel類的任何對象的確切文字簡單地使用token.originalText()代替token.toString()

如果你需要我做什麼這些標記，看看CoreLabel的javadoc。

來源

2017-05-06 07:03:24 entrophy

這對我的第二個問題很有效，非常感謝 –

斯坦福數字名稱實體識別

回答

相關問題