2016-04-27 19 views
0

我想從下面的腳本中給出的示例文本中獲取NN或NNS。爲此,當我用下面的代碼,輸出爲:如何從文本中獲取NN和NNS?

types 
synchronization 
phase 
synchronization 
-RSB- 
synchronization 
-LSB- 
-RSB- 
projection 
synchronization 

這裏爲什麼我收到[-RSB-][-LSB-]?我應該使用不同的模式來同時獲取NN或NNS嗎?

   atic = "So far, many different types of synchronization have been investigated, such as complete synchronization [8], generalized synchronization [9], phase synchronization [10], lag synchronization [11], projection synchronization [12, 13], and so forth."; 

Reader reader = new StringReader(atic); 
DocumentPreprocessor dp = new DocumentPreprocessor(reader);   
docs_terms_unq.put(rs.getString("u"), new ArrayList<String>()); 
docs_terms.put(rs.getString("u"), new ArrayList<String>()); 

for (List<HasWord> sentence : dp) { 

List<TaggedWord> tagged = tagger.tagSentence(sentence); 
GrammaticalStructure gs = parser.predict(tagged); 


Tree x = parserr.parse(sentence); 
System.out.println(x); 
TregexPattern NPpattern = TregexPattern.compile("@NN|NNS"); 
TregexMatcher matcher = NPpattern.matcher(x); 


while (matcher.findNextMatchingNode()) { 

Tree match = matcher.getMatch(); 
ArrayList hh = match.yield();  
Boolean b = false; 

System.out.println(hh.toString());} 

回答

1

我不知道爲什麼這些會出現。但是如果你使用詞性標註器,你會得到更準確的POS標籤。我建議直接看註釋。這裏是一些示例代碼。

import edu.stanford.nlp.ling.CoreAnnotations; 
import edu.stanford.nlp.ling.CoreLabel; 
import edu.stanford.nlp.pipeline.Annotation; 
import edu.stanford.nlp.pipeline.StanfordCoreNLP; 
import edu.stanford.nlp.util.CoreMap; 

import java.util.Properties; 

public class NNExample { 

    public static void main(String[] args) { 
     Properties props = new Properties(); 
     props.setProperty("annotators", "tokenize,ssplit,pos"); 
     StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
     String text = "So far, many different types of synchronization have been investigated, such as complete " + 
       "synchronization [8], generalized synchronization [9], phase synchronization [10], " + 
       "lag synchronization [11], projection synchronization [12, 13], and so forth."; 
     Annotation annotation = new Annotation(text); 
     pipeline.annotate(annotation); 
     for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) { 
      for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) { 
       String partOfSpeechTag = token.get(CoreAnnotations.PartOfSpeechAnnotation.class); 
       if (partOfSpeechTag.equals("NN") || partOfSpeechTag.equals("NNS")) { 
        System.out.println(token.word()); 
       } 
      } 
     } 
    } 
} 

和我得到的輸出。

types 
synchronization 
synchronization 
synchronization 
phase 
synchronization 
lag 
synchronization 
projection 
synchronization 
+0

非常感謝你。我確實已經意識到,當使用我以前的方法時,它減少了名詞的數量! –

+0

一個小問題。當需要NP時,我是否應該使用相同的方法?對於我發佈的那個,我使用它像TregexPattern.compile(「@ NP!<< @NP」)。那麼我可以使用partOfSpeechTag.equals(「@ NP!<< @NP」)嗎? –

1

這裏是獲取NP的,從一個句子的例子:

import edu.stanford.nlp.ling.CoreAnnotations; 
import edu.stanford.nlp.ling.Word; 
import edu.stanford.nlp.pipeline.Annotation; 
import edu.stanford.nlp.pipeline.StanfordCoreNLP; 
import edu.stanford.nlp.trees.*; 

import java.io.IOException; 
import java.util.ArrayList; 
import java.util.Properties; 

public class TreeExample { 

    public static void printNounPhrases(Tree inputTree) { 
     if (inputTree.label().value().equals("NP")) { 
      ArrayList<Word> words = new ArrayList<Word>(); 
      for (Tree leaf : inputTree.getLeaves()) { 
       words.addAll(leaf.yieldWords()); 
      } 
      System.out.println(words); 
     } else { 
      for (Tree subTree : inputTree.children()) { 
       printNounPhrases(subTree); 
      } 
     } 
    } 

    public static void main (String[] args) throws IOException { 
     Properties props = new Properties(); 
     props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse"); 
     StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
     String text = "Susan Thompson is from Florida."; 
     Annotation annotation = new Annotation(text); 
     pipeline.annotate(annotation); 
     Tree sentenceTree = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0).get(
       TreeCoreAnnotations.TreeAnnotation.class); 
     //System.out.println(sentenceTree); 
     printNounPhrases(sentenceTree); 
    } 

} 
+0

當我在我的帖子中給出的示例文本嘗試這個例子時,我得到:'[complete,synchronization,-LSB-,8,-RSB-,,, generalized,synchronization,-LSB-,9,-RSB- ,相位,同步,-LSB-,10,-RSB-,,,滯後,同步,-LSB-,11,-RSB-,投影,同步,-LSB-,12,...,13, - RSB-]' –