檢查輸入文本中來自單詞集合的單詞

我收集了名詞短語約10,000個單詞。我想檢查這些NP收集的每個新的輸入文本數據，並提取那些包含這些NP的任何句子。我不想爲每個單詞運行循環，因爲這會使我的代碼變得很慢。我正在使用Java和Stanford CoreNLP。檢查輸入文本中來自單詞集合的單詞

來源

2017-08-20 Rizstien

你寫任何代碼，然而對於你目前擁有的慢版？如果您將問題編輯爲問題並向我們展示您所得到的結果，那麼有人可能會對其進行改進並提供幫助。 – Assafs

一個快速簡便的方法是使用Regexner來識別字典中任何東西的所有示例，然後檢查句子中的非「O」NER標籤。

package edu.stanford.nlp.examples; 

import edu.stanford.nlp.ling.*; 
import edu.stanford.nlp.pipeline.*; 
import edu.stanford.nlp.util.*; 
import java.util.*; 
import java.util.stream.Collectors; 

public class FindSentencesWithPhrase { 

    public static boolean checkForNamedEntity(CoreMap sentence) { 
    for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) { 
     if (token.ner() != null && !token.ner().equals("O")) { 
     return true; 
     } 
    } 
    return false; 
    } 

    public static void main(String[] args) { 
    Properties props = new Properties(); 
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,regexner"); 
    props.setProperty("regexner.mapping", "phrases.rules"); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
    String exampleText = "This sentence contains the phrase \"ice cream\"." + 
     "This sentence is not of interest. This sentences contains pizza."; 
    Annotation ann = new Annotation(exampleText); 
    pipeline.annotate(ann); 
    for (CoreMap sentence : ann.get(CoreAnnotations.SentencesAnnotation.class)) { 
     if (checkForNamedEntity(sentence)) { 
     System.out.println("---"); 
     System.out.println(sentence.get(CoreAnnotations.TokensAnnotation.class). 
      stream().map(token -> token.word()).collect(Collectors.joining(" "))); 
     } 
    } 
    } 
}

文件「phrases.rules」應該是這樣的：

ice cream  PHRASE_OF_INTEREST  MISC 1 
pizza PHRASE_OF_INTEREST  MISC 1

來源

2017-08-21 03:18:27 StanfordNLPHelp

您可以將「regexner.ignorecase」設置爲true或false，具體取決於您是否要區分大小寫。 – StanfordNLPHelp

檢查輸入文本中來自單詞集合的單詞

回答

相關問題