加載NER分類器時出錯 - ZLIB輸入流意外結束

我已經用Stanford-NER訓練了一個自定義NER模型。我創建了一個屬性文件以及所使用的-serverProperties參數與java命令來啓動我的服務器（方向我從我的另一個問題來了，看到here）並裝載我的自定義NER模型但當服務器嘗試加載我的自定義模型，可以失敗，此錯誤：java.io.EOFException: Unexpected end of ZLIB input stream加載NER分類器時出錯 - ZLIB輸入流意外結束

的stderr.log輸出錯誤如下：

[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called --- 
[main] INFO CoreNLP - setting default constituency parser 
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz 
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead 
[main] INFO CoreNLP - to use shift reduce parser download English models jar from: 
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html 
[main] INFO CoreNLP -  Threads: 4 
[main] INFO CoreNLP - Liveness server started at /0.0.0.0:9000 
[main] INFO CoreNLP - Starting server... 
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0.0.0.0:80 
[pool-1-thread-3] INFO CoreNLP - [/127.0.0.1:35546] API call w/annotators tokenize,ssplit,pos,lemma,depparse,natlog,ner,openie 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer. 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos 
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec]. 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse 
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... [pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 12.297 (s) 
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [13.6 sec]. 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator natlog 
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner 
java.io.EOFException: Unexpected end of ZLIB input stream 
    at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240 
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)  
    at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)  
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
    at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2620) 
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2636)  
    at java.io.ObjectInputStream$BlockDataInputStream.readDoubles(ObjectInputStream.java:3333) 
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1920) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529) 
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) 
    at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2650) 
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1462) 
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1494) 
    at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2963)  
    at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:282) 
    at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:266) 
    at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:141) 
    at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:128)  
    at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:121)  
    at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:273) 
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:152) 
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:451)  
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:154) 
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:145) 
    at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.mkStanfordCoreNLP(StanfordCoreNLPServer.java:273)  
    at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.access$500(StanfordCoreNLPServer.java:50)  
    at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:583)  
    at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)  
    at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83) 
    at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)  
    at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675) 
    at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)  
    at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:748)

我用Google搜索什麼，我讀的是從2007年 - 關於與Java的問題這個錯誤，最2010年EOFException是「一個隨意「拋出。該信息來自here。

「當使用gzip（通過新Deflater（Deflater.BEST_COMPRESSION，真）），對於一些文件，並EOFException類是在膨脹的末尾拋出。雖然該文件是正確的，錯誤是拋出：EOFException被拋出不一致。對於某些文件而言，它被拋出，而其他則不是。「

有關此錯誤的其他人的問題的答案表明您必須關閉gzip的輸出流...？不完全確定這意味着什麼，並且我不知道如何執行該建議，因爲Stanford-NER是爲我創建gzip文件的軟件。

問題：我可以採取哪些措施消除此錯誤？我希望這發生在過去的其他人身上。同時尋找來自@StanfordNLPHelp的反饋，瞭解過去是否存在類似問題，以及是否有某些事情正在做/ CoreNLP軟件已經採取了一些措施來消除此問題。如果有來自CoreNLP的解決方案，我需要更改哪些文件，這些文件位於CoreNLP框架內的哪些位置，以及我需要做哪些更改？

ADDED INFO（PER @StanfordNLPHelp評論）：

我的模型是使用方向訓練有素髮現here。爲了訓練模型，我使用了包含來自約90個文件的文本的說明中概括的TSV。我知道這不是一個大量的數據需要培訓，但我們只是在測試階段，並會改進模型，因爲我們獲得更多的數據。

有了這個TSV文件和Standford-NER軟件，我運行了下面的命令。

java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop

我然後有我的模型建立和甚至能夠成功加載和標記使用隨斯坦福-NER軟件NER GUI文本的語料庫越大。

在故障排除期間，爲什麼我無法讓模型工作我還嘗試使用CoreNLP中標準的「3類模型」的文件路徑來更新我的server.properties文件。同樣的錯誤也是失敗了。

事實上，我的自定義模型和3類模型都在斯坦福大學NER軟件中工作，但無法加載，這讓我相信我的自定義模型不是問題，並且CoreNLP軟件的加載方式存在一些問題這些模型通過-serverProperties的說法。或者它可能是我完全不知道的東西。

我用來訓練我的NER模型的屬性文件與更改列車文件和更改輸出文件名稱的方向類似。它看起來像這樣：

# location of the training file 
trainFile = custom-model-trainingfile.tsv 
# location where you would like to save (serialize) your 
# classifier; adding .gz at the end automatically gzips the file, 
# making it smaller, and faster to load 
serializeTo = custome-ner-model.ser.gz 

# structure of your training file; this tells the classifier that 
# the word is in column 0 and the correct answer is in column 1 
map = word=0,answer=1 

# This specifies the order of the CRF: order 1 means that features 
# apply at most to a class pair of previous class and current class 
# or current class and next class. 
maxLeft=1 

# these are the features we'd like to train with 
# some are discussed below, the rest can be 
# understood by looking at NERFeatureFactory 
useClassFeature=true 
useWord=true 
# word character ngrams will be included up to length 6 as prefixes 
# and suffixes only 
useNGrams=true 
noMidNGrams=true 
maxNGramLeng=6 
usePrev=true 
useNext=true 
useDisjunctive=true 
useSequences=true 
usePrevSequences=true 
# the last 4 properties deal with word shape features 
useTypeSeqs=true 
useTypeSeqs2=true 
useTypeySequences=true 
wordShape=chris2useLC

我server.properties文件只包含一個行ner.model = /path/to/custom_model.ser.gz

我在啓動腳本還增加/path/to/custom_model到$ CLASSPATH變量。將行CLASSPATH="$CLASSPATH:$JAR更改爲CLASSPATH="$CLASSPATH:$JAR:/path/to/custom_model.ser.gz。我不確定這是否是必要的步驟，因爲我首先得到ZLIB錯誤的提示。只是想包括這個完整性。

試圖用命令gunzip custom_model.ser.gz「gunzip」我的自定義模型，並得到了類似的錯誤，我試圖加載模型時得到的。它是gzip: custom_model.ser.gz: unexpected end of file

來源

2017-05-16 Fraizier Reiland

@ChristopherManning你明顯知道CoreNLP的相當多，我看到你傾向於回答錯誤相關的問題。你以前見過這個嗎？ –

你有沒有真正成功地運行你的訓練模型？你能否提供一些關於你如何訓練你的新神經模型的細節......例如使用的命令和屬性文件？如果你得到這樣的錯誤，它會讓我認爲訓練的模型文件本身有問題。 – StanfordNLPHelp

也有你試過在命令行上gunzip的文件？我不認爲這個文件必須經過gzip才能正常工作。所以你可以嘗試加載非壓縮版本。 – StanfordNLPHelp

我假設你下載斯坦福CoreNLP 3.7.0並有一個文件夾稱爲stanford-corenlp-full-2016-10-31某處。爲了這個例子，我們假設它在/Users/stanfordnlphelp/stanford-corenlp-full-2016-10-31（將其更改爲您的具體情況）

此外，只是爲了闡明，當您運行Java程序時，它會在編譯代碼和資源時查找CLASSPATH。設置CLASSPATH的常見方法是僅使用export命令設置CLASSPATH環境變量。

通常，Java編譯代碼和資源存儲在jar文件中。

如果你看看stanford-corenlp-full-2016-10-31，你會看到一堆.jar文件。其中之一叫做stanford-corenlp-3.7.0-models.jar。您可以使用以下命令查看jar文件中的內容：jar tf stanford-corenlp-3.7.0-models.jar。

你會注意到當你看到那個文件裏面有（其中包括）各種不同的模型。例如，你應該看到這個文件：

edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz

在模型jar。

因此，讓我們能夠正常工作的一種合理方式是運行服務器並告訴它只加載1個模型（因爲默認情況下它將加載3個模型）。

運行在一個窗口中，這些命令（在相同的目錄中的文件ner-server.properties）

export CLASSPATH=/Users/stanfordnlphelp/stanford-corenlp-full-2016-10-31/*: 

java -Xmx12g edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -serverProperties ner-server.properties

與ner-server.properties是與一個2行文件這兩條線：

annotators = tokenize,ssplit,pos,lemma,ner 
ner.model = edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz

的export命令以上是放EVERY罐子在該目錄中的CLASSPATH。這就是*的含義。所以stanford-corenlp-3.7.0-models.jar應該在CLASSPATH。因此，當Java代碼運行時，它將能夠找到edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz。

在一個不同的終端窗口，發出此命令：

wget --post-data 'Joe Smith lives in Hawaii.' 'localhost:9000/?properties={"outputFormat":"json"}' -O -

當該運行時，你應該在所述第一窗口中看到（其中，服務器是運行）只有這個模型正在加載edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz。

你應該注意的是，如果你刪除了文件的ner.model，並重新做了所有的這些，3款車型將改爲裝載的1

請讓我知道，所有的作品或沒有。

我們假設我製作了一個名爲custom_model.ser.gz的NER模型，該文件是StanfordCoreNLP在訓練過程之後輸出的文件。比方說，我把它放在文件夾/Users/stanfordnlphelp/。

如果步驟1和2的工作，你應該能夠改變ner-server.properties這樣的：

annotators = tokenize,ssplit,pos,lemma,ner 
ner.model = /Users/stanfordnlphelp/custom_model.ser.gz

當你做同樣的事情，它會顯示您的自定義模型加載。不應該有任何類型的gzip問題。如果您仍然遇到gzip問題，請告訴我您正在使用哪種系統？ Mac OS X，Unix，Windows等...？

爲了確認，你說你已經用獨立的斯坦福NER軟件運行你的自定義NER模型了嗎？如果是這樣，這聽起來像模型文件很好。

來源

2017-05-18 00:13:39 StanfordNLPHelp

我能夠成功加載我的自定義模型。我通過另一個不同的堆棧站點上的另一個問題發現，當你在一個操作系統上創建一個gzip文件時（在我的情況下是Windows），並嘗試在另一個操作系統（在我的Linux上）使用該gzip時會出現問題。當我在我的Windows系統上加載模型時，我沒有收到錯誤信息。 **最大的挑戰是在您打算加載的操作系統上創建該模型。**看起來像常識，但現在我們知道了。再次感謝。 –

加載NER分類器時出錯 - ZLIB輸入流意外結束

回答

相關問題