2016-08-23 22 views
0

有人可以幫助我糾正我的設置,通過使用coreNLP執行法語的核心註釋嗎?我已經通過編輯屬性tryed基本的建議文件:法國Coreference註釋使用CoreNLP

annotators = tokenize, ssplit, pos, parse, lemma, ner, parse, depparse, mention, coref 
tokenize.language = fr 
pos.model = edu/stanford/nlp/models/pos-tagger/french/french.tagger  
parse.model = edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz 

命令:

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props frenchProps.properties -file frenchFile.txt 

它得到下面的輸出日誌:

[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize 
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit 
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos 
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/french/french.tagger ... done [0.3 sec]. 
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse 
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz ... 
done [2.2 sec]. 
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma 
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner 
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.0 sec]. 
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.7 sec]. 
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.9 sec]. 
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1. 
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt 
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules 
INFORMACIÓN: Read 83 rules 
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt 
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules 
INFORMACIÓN: Read 267 rules 
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt 
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules 
INFORMACIÓN: Read 25 rules 
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse 
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse 
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... 
PreComputed 100000, Elapsed Time: 1.639 (s) 
Initializing dependency parser done [6.4 sec]. 
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator mention 
Using mention detector type: rule 
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref 
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 
    at java.util.Arrays.copyOfRange(Arrays.java:3664) 
    at java.lang.String.<init>(String.java:207) 
    at java.lang.StringBuilder.toString(StringBuilder.java:407) 
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3097) 
    at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2892) 
    at java.io.ObjectInputStream.readString(ObjectInputStream.java:1646) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) 
    at java.util.HashMap.readObject(HashMap.java:1402) 
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) 
    at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:324) 
    at edu.stanford.nlp.scoref.SimpleLinearClassifier.<init>(SimpleLinearClassifier.java:30) 
    at edu.stanford.nlp.scoref.PairwiseModel.<init>(PairwiseModel.java:75) 
    at edu.stanford.nlp.scoref.PairwiseModel$Builder.build(PairwiseModel.java:57) 
    at edu.stanford.nlp.scoref.ClusteringCorefSystem.<init>(ClusteringCorefSystem.java:31) 
    at edu.stanford.nlp.scoref.StatisticalCorefSystem.fromProps(StatisticalCorefSystem.java:48) 
    at edu.stanford.nlp.pipeline.CorefAnnotator.<init>(CorefAnnotator.java:66) 
    at edu.stanford.nlp.pipeline.AnnotatorImplementations.coref(AnnotatorImplementations.java:220) 
    at edu.stanford.nlp.pipeline.AnnotatorFactories$13.create(AnnotatorFactories.java:515) 
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85) 
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:375) 

這讓我覺得有額外缺少配置的東西。

回答

0

AFAIK CoreNLP不爲法語提供共同決議。 (另見http://stanfordnlp.github.io/CoreNLP/coref.html

+0

謝謝你的回答@Igor。我已經閱讀了您所示的鏈接。我通過將我的文本翻譯成英文來嘗試近似。可能對於註釋者而言,這種局部解決方案可能非常有偏見;但爲了共同性,它爲我工作。任何方式,如果你有另一個建議,這是值得歡迎的。 – Nacho