我已經用Stanford-NER訓練了一個自定義NER模型。我創建了一個屬性文件以及所使用的-serverProperties
參數與java命令來啓動我的服務器(方向我從我的另一個問題來了,看到here)並裝載我的自定義NER模型但當服務器嘗試加載我的自定義模型,可以失敗,此錯誤:java.io.EOFException: Unexpected end of ZLIB input stream
加載NER分類器時出錯 - ZLIB輸入流意外結束
的stderr.log
輸出錯誤如下:
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP - Threads: 4
[main] INFO CoreNLP - Liveness server started at /0.0.0.0:9000
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0.0.0.0:80
[pool-1-thread-3] INFO CoreNLP - [/127.0.0.1:35546] API call w/annotators tokenize,ssplit,pos,lemma,depparse,natlog,ner,openie
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... [pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 12.297 (s)
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [13.6 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator natlog
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2620)
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2636)
at java.io.ObjectInputStream$BlockDataInputStream.readDoubles(ObjectInputStream.java:3333)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1920)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2650)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1462)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1494)
at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2963)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:282)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:266)
at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:141)
at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:128)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:121)
at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:273)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:152)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:451)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:154)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:145)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.mkStanfordCoreNLP(StanfordCoreNLPServer.java:273)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.access$500(StanfordCoreNLPServer.java:50)
at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:583)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
我用Google搜索什麼,我讀的是從2007年 - 關於與Java的問題這個錯誤,最2010年EOFException是「一個隨意「拋出。該信息來自here。
「當使用gzip(通過新Deflater(Deflater.BEST_COMPRESSION,真)),對於一些文件,並EOFException類是在膨脹的末尾拋出。雖然該文件是正確的,錯誤是拋出:EOFException被拋出不一致。對於某些文件而言,它被拋出,而其他則不是。「
有關此錯誤的其他人的問題的答案表明您必須關閉gzip的輸出流...?不完全確定這意味着什麼,並且我不知道如何執行該建議,因爲Stanford-NER是爲我創建gzip文件的軟件。
問題:我可以採取哪些措施消除此錯誤?我希望這發生在過去的其他人身上。同時尋找來自@StanfordNLPHelp的反饋,瞭解過去是否存在類似問題,以及是否有某些事情正在做/ CoreNLP軟件已經採取了一些措施來消除此問題。如果有來自CoreNLP的解決方案,我需要更改哪些文件,這些文件位於CoreNLP框架內的哪些位置,以及我需要做哪些更改?
ADDED INFO(PER @StanfordNLPHelp評論):
我的模型是使用方向訓練有素髮現here。爲了訓練模型,我使用了包含來自約90個文件的文本的說明中概括的TSV。我知道這不是一個大量的數據需要培訓,但我們只是在測試階段,並會改進模型,因爲我們獲得更多的數據。
有了這個TSV文件和Standford-NER軟件,我運行了下面的命令。
java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop
我然後有我的模型建立和甚至能夠成功加載和標記使用隨斯坦福-NER軟件NER GUI文本的語料庫越大。
在故障排除期間,爲什麼我無法讓模型工作我還嘗試使用CoreNLP中標準的「3類模型」的文件路徑來更新我的server.properties文件。同樣的錯誤也是失敗了。
事實上,我的自定義模型和3類模型都在斯坦福大學NER軟件中工作,但無法加載,這讓我相信我的自定義模型不是問題,並且CoreNLP軟件的加載方式存在一些問題這些模型通過-serverProperties
的說法。或者它可能是我完全不知道的東西。
我用來訓練我的NER模型的屬性文件與更改列車文件和更改輸出文件名稱的方向類似。它看起來像這樣:
# location of the training file
trainFile = custom-model-trainingfile.tsv
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo = custome-ner-model.ser.gz
# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map = word=0,answer=1
# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.
maxLeft=1
# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useDisjunctive=true
useSequences=true
usePrevSequences=true
# the last 4 properties deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
我server.properties文件只包含一個行ner.model = /path/to/custom_model.ser.gz
我在啓動腳本還增加/path/to/custom_model
到$ CLASSPATH變量。將行CLASSPATH="$CLASSPATH:$JAR
更改爲CLASSPATH="$CLASSPATH:$JAR:/path/to/custom_model.ser.gz
。我不確定這是否是必要的步驟,因爲我首先得到ZLIB錯誤的提示。只是想包括這個完整性。
試圖用命令gunzip custom_model.ser.gz
「gunzip」我的自定義模型,並得到了類似的錯誤,我試圖加載模型時得到的。它是gzip: custom_model.ser.gz: unexpected end of file
@ChristopherManning你明顯知道CoreNLP的相當多,我看到你傾向於回答錯誤相關的問題。你以前見過這個嗎? –
你有沒有真正成功地運行你的訓練模型?你能否提供一些關於你如何訓練你的新神經模型的細節......例如使用的命令和屬性文件?如果你得到這樣的錯誤,它會讓我認爲訓練的模型文件本身有問題。 – StanfordNLPHelp
也有你試過在命令行上gunzip的文件?我不認爲這個文件必須經過gzip才能正常工作。所以你可以嘗試加載非壓縮版本。 – StanfordNLPHelp