2015-05-03 20 views
0

斯坦福CoreNLP 3.5.2已發佈,包含中文參考文獻 如何使用中文參考?它僅包含一個文件「ZH-attributes.txt.gz」我如何在斯坦福CoreNLP 3.5.2中使用中文共同參考?

感謝,

開發

+0

我按標準例子可以執行完成,但沒有出正確結果。哪位朋友有沒有正確運行出結果的,可否給一個配置過程及實際例子? I did run out the standard examples, but did not get the correct results. If you can run out the correct results, please give me a practical example of the configuration process. Thanks you very much! – itdev

回答

1

有關純文本運行,請確保您的類路徑中有中國的相關模型和嘗試以下操作:

 

    String text = "Your text here"; 
    String[] args = new String[]{ 
     "-props", "edu/stanford/nlp/hcoref/properties/zh-dcoref-default.properties" 
    }; 

    Annotation document = new Annotation(text); 
    Properties props = StringUtils.argsToProperties(args); 
    StanfordCoreNLP corenlp = new StanfordCoreNLP(props); 
    corenlp.annotate(document); 
    HybridCorefAnnotator hcoref = new HybridCorefAnnotator(props); 
    hcoref.annotate(document); 
    Map corefChain = document.get(CorefChainAnnotation.class); 
    System.out.println(corefChain); 

此外,我們還對如何在CoNLL運行2012個數據here

+0

there is no "hcoref" directory in 3.5.2 models :( – itdev

+0

thank you ! i get it in new models! when i run example, say lost "demonyms.txt", and i copy "stanford-corenlp-3.5.2-models/edu/standford/nlp/modeles/dcoref/" to "stanford-chinese-corenlp-2015-04-20-models/edu/standford/nlp/modeles/dcoref/", – itdev

+0

1, According to the above suggeetion, i have checked several times, but correct results didn't appeared ,it always show "{}" 2, how to integrate prop including word segmentation 、 NER、 parse and coref, i tried but there isn't correct result display. According to the directory in models, is hcoref a separate system? – itdev

0

謝謝你的文檔!我在新模型中獲得它! 當我運行示例,說失去了「demonyms.txt」,我複製「stanford-corenlp-3.5.2-models/edu/standford/nlp/modeles/dcoref /」到「stanford-chinese-corenlp-2015-04 -20-models/edu/standford/nlp/modeles/dcoref /「, 然後 code [凱特,發,了,一封,郵件,給,斯坦福,,,他,沒有,收到,回覆。,] 加載篩:ChineseHeadMatch ... 加載篩:ExactStringMatch ... 加載篩:PreciseConstructs ... 加載篩:StrictHeadMatch1 ... 加載篩:StrictHeadMatch2 ... 加載篩:StrictHeadMatch3 .. 。 Loading sieve:StrictHeadMatch4 ... Loading sieve:PronounMatc ħ... 語義未加載 MentionExtractor忽略指定的註釋,註釋使用引理=,NER 註冊註釋器段與類edu.stanford.nlp.pipeline.ChineseSegmenterAnnotator 添加註釋引理 添加註釋NER 空 code

1

嗨,我們剛剛發佈了一些補丁的3.5.2罐的一些新版本,我想在這裏留下這個答案給人。

如果你去:http://nlp.stanford.edu/software/corenlp.shtml

和下載斯坦福CoreNLP 3.5.2和相應的「中國模式」(文件名爲斯坦福 - 中國 - corenlp - 2015年4月20日 - models.jar)你將有你需要運行中國共同參考的罐子。

您需要在您的類路徑中使用stanford-corenlp-3.5.2.jar,stanford-corenlp-3.5.2-models.jar和stanford-chinese-corenlp-2015-04-20-models.jar。

此外,您可以從Maven上的stanford-corenlp項目獲得完全相同的jar。我們剛剛在Maven發佈了中文模型jar。以下是如何將它添加到你的pom.xml:

<dependency> 
    <groupId>edu.stanford.nlp</groupId> 
    <artifactId>stanford-corenlp</artifactId> 
    <version>3.5.2</version> 
    <classifier>models-chinese</classifier> 
</dependency> 

只是爲您提供一些剪切和粘貼,測試類代碼蕭寫在下面提供所需要進口。我省略了示例中文文本,因此將字符串文本設置爲要運行的示例中文文本。

這應該運行這樣的命令:java -mx5g -cp「location-of-jar/* :.」 ChineseCorefTester -props edu/stanford/nlp/hcoref/properties/zh-dcoref-default。properties

import java.util.ArrayList; 
import java.util.Collections; 
import java.util.List; 
import java.util.Map; 
import java.util.Properties; 
import java.util.Set; 

import edu.stanford.nlp.hcoref.CorefCoreAnnotations; 
import edu.stanford.nlp.hcoref.CorefCoreAnnotations.CorefChainAnnotation; 
import edu.stanford.nlp.hcoref.CorefSystem; 
import edu.stanford.nlp.hcoref.data.CorefChain; 
import edu.stanford.nlp.hcoref.data.CorefChain.CorefMention; 
import edu.stanford.nlp.hcoref.data.Document; 
import edu.stanford.nlp.ling.CoreAnnotations; 
import edu.stanford.nlp.ling.CoreLabel; 
import edu.stanford.nlp.util.ArraySet; 
import edu.stanford.nlp.util.CoreMap; 
import edu.stanford.nlp.util.Generics; 
import edu.stanford.nlp.util.IntTuple; 
import edu.stanford.nlp.util.Pair; 
import edu.stanford.nlp.util.StringUtils; 
import java.io.*; 
import java.util.*; 
import edu.stanford.nlp.io.*; 
import edu.stanford.nlp.ling.*; 
import edu.stanford.nlp.pipeline.*; 
import edu.stanford.nlp.trees.*; 
import edu.stanford.nlp.trees.TreeCoreAnnotations.*; 
import edu.stanford.nlp.util.*; 

public class ChineseCorefTester { 

    public static void main(String[] args) { 

     String text = "<insert sample Chinese text here!>"; 
     Annotation document = new Annotation(text); 
     Properties props = StringUtils.argsToProperties(args); 
     StanfordCoreNLP corenlp = new StanfordCoreNLP(props); 
     corenlp.annotate(document); 
     HybridCorefAnnotator hcoref = new HybridCorefAnnotator(props); 
     hcoref.annotate(document); 
     System.out.println(document.get(CorefChainAnnotation.class)); 
    } 
} 
+0

thank you, i can run example no error. but , no one result too; txten="Kosgi Santosh sent an email to Stanford University. He didn't get a reply."; txtcn="凱特發了一封郵件給斯坦福大學,他沒有收到回覆."; and log is: Loading sieve: StrictHeadMatch2 ... Loading sieve: StrictHeadMatch3 ... Loading sieve: StrictHeadMatch4 ... Loading sieve: PronounMatch ... SEMANTICS NOT LOADED MentionExtractor ignores specified annotators, using annotators=lemma, ner Registering annotator segment with class edu.stanford.nlp.pipeline.ChineseSegmenterAnnotator Adding annotator lemma Adding annotator ner {} – itdev

+0

I as well am getting the empty output. I don't have a deep knowledge of the Chinese co-ref system, so it is possible the system just failed to see the connection between "Kosgi" and "He." That doesn't mean it is functioning improperly ; keep in mind the system is not guaranteed to find all coreferences with 100% accuracy. – StanfordNLPHelp

+0

That being said, I will see if I can get some insight into why that sentence has no result. – StanfordNLPHelp