如何從文本中獲取NN和NNS？

我想從下面的腳本中給出的示例文本中獲取NN或NNS。爲此，當我用下面的代碼，輸出爲：如何從文本中獲取NN和NNS？

types 
synchronization 
phase 
synchronization 
-RSB- 
synchronization 
-LSB- 
-RSB- 
projection 
synchronization

這裏爲什麼我收到[-RSB-]或[-LSB-]？我應該使用不同的模式來同時獲取NN或NNS嗎？

   atic = "So far, many different types of synchronization have been investigated, such as complete synchronization [8], generalized synchronization [9], phase synchronization [10], lag synchronization [11], projection synchronization [12, 13], and so forth."; 

Reader reader = new StringReader(atic); 
DocumentPreprocessor dp = new DocumentPreprocessor(reader);   
docs_terms_unq.put(rs.getString("u"), new ArrayList<String>()); 
docs_terms.put(rs.getString("u"), new ArrayList<String>()); 

for (List<HasWord> sentence : dp) { 

List<TaggedWord> tagged = tagger.tagSentence(sentence); 
GrammaticalStructure gs = parser.predict(tagged); 


Tree x = parserr.parse(sentence); 
System.out.println(x); 
TregexPattern NPpattern = TregexPattern.compile("@NN|NNS"); 
TregexMatcher matcher = NPpattern.matcher(x); 


while (matcher.findNextMatchingNode()) { 

Tree match = matcher.getMatch(); 
ArrayList hh = match.yield();  
Boolean b = false; 

System.out.println(hh.toString());}

來源

2016-04-27 mlee_jordan

我不知道爲什麼這些會出現。但是如果你使用詞性標註器，你會得到更準確的POS標籤。我建議直接看註釋。這裏是一些示例代碼。

import edu.stanford.nlp.ling.CoreAnnotations; 
import edu.stanford.nlp.ling.CoreLabel; 
import edu.stanford.nlp.pipeline.Annotation; 
import edu.stanford.nlp.pipeline.StanfordCoreNLP; 
import edu.stanford.nlp.util.CoreMap; 

import java.util.Properties; 

public class NNExample { 

    public static void main(String[] args) { 
     Properties props = new Properties(); 
     props.setProperty("annotators", "tokenize,ssplit,pos"); 
     StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
     String text = "So far, many different types of synchronization have been investigated, such as complete " + 
       "synchronization [8], generalized synchronization [9], phase synchronization [10], " + 
       "lag synchronization [11], projection synchronization [12, 13], and so forth."; 
     Annotation annotation = new Annotation(text); 
     pipeline.annotate(annotation); 
     for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) { 
      for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) { 
       String partOfSpeechTag = token.get(CoreAnnotations.PartOfSpeechAnnotation.class); 
       if (partOfSpeechTag.equals("NN") || partOfSpeechTag.equals("NNS")) { 
        System.out.println(token.word()); 
       } 
      } 
     } 
    } 
}

和我得到的輸出。

types 
synchronization 
synchronization 
synchronization 
phase 
synchronization 
lag 
synchronization 
projection 
synchronization

來源

2016-04-28 01:37:25 StanfordNLPHelp

非常感謝你。我確實已經意識到，當使用我以前的方法時，它減少了名詞的數量！ –

一個小問題。當需要NP時，我是否應該使用相同的方法？對於我發佈的那個，我使用它像TregexPattern.compile（「@ NP！<< @NP」）。那麼我可以使用partOfSpeechTag.equals（「@ NP！<< @NP」）嗎？ –

這裏是獲取NP的，從一個句子的例子：

import edu.stanford.nlp.ling.CoreAnnotations; 
import edu.stanford.nlp.ling.Word; 
import edu.stanford.nlp.pipeline.Annotation; 
import edu.stanford.nlp.pipeline.StanfordCoreNLP; 
import edu.stanford.nlp.trees.*; 

import java.io.IOException; 
import java.util.ArrayList; 
import java.util.Properties; 

public class TreeExample { 

    public static void printNounPhrases(Tree inputTree) { 
     if (inputTree.label().value().equals("NP")) { 
      ArrayList<Word> words = new ArrayList<Word>(); 
      for (Tree leaf : inputTree.getLeaves()) { 
       words.addAll(leaf.yieldWords()); 
      } 
      System.out.println(words); 
     } else { 
      for (Tree subTree : inputTree.children()) { 
       printNounPhrases(subTree); 
      } 
     } 
    } 

    public static void main (String[] args) throws IOException { 
     Properties props = new Properties(); 
     props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse"); 
     StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
     String text = "Susan Thompson is from Florida."; 
     Annotation annotation = new Annotation(text); 
     pipeline.annotate(annotation); 
     Tree sentenceTree = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0).get(
       TreeCoreAnnotations.TreeAnnotation.class); 
     //System.out.println(sentenceTree); 
     printNounPhrases(sentenceTree); 
    } 

}

來源

2016-04-29 01:54:22 StanfordNLPHelp

當我在我的帖子中給出的示例文本嘗試這個例子時，我得到：'[complete，synchronization，-LSB-，8，-RSB-，,, generalized，synchronization，-LSB-，9，-RSB- ，相位，同步，-LSB-，10，-RSB-，,,滯後，同步，-LSB-，11，-RSB-，投影，同步，-LSB-，12，...，13， - RSB-]' –

如何從文本中獲取NN和NNS？

回答

相關問題