我在自然語言處理(NLP)新的,我想要做的部分詞性標註(POS),然後就找內的特定結構文本。我可以用斯坦福NLP管理詞性標註,但是,我不知道如何提取這種結構:提取物基於POS一個語言結構標記使用斯坦福句子NLP在JAVA
NN/NNS + IN + DT + NN/NNS/NNP/NNPS
public static void main(String args[]) throws Exception{
//input File
String contentFilePath = "";
//outputFile
String triplesFilePath = contentFilePath.substring(0, contentFilePath.length()-4)+"_postagg.txt";
//document to POS tagging
String content = getFileContent(contentFilePath);
Properties props = new Properties();
props.setProperty("annotators","tokenize, ssplit, pos");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// Annotate the document.
Annotation doc = new Annotation(content);
pipeline.annotate(doc);
// Annotate the document.
List<CoreMap> sentences = doc.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {
String word = token.get(CoreAnnotations.TextAnnotation.class);
// this is the POS tag of the token
String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
System.out.println(word + "/" + pos);
} }}}
我剛剛意識到,判定器的POS標記是「DT」,而不是「DET」。我糾正我的回答如下,它的工作現在。 –