我正在使用Solr 6.4.1版本,我最近在索引文件中發佈了大約1000個文件。我在Windows 10中使用Windows Powershell來使用該命令發佈文件。Solr索引在發佈文件時產生錯誤
PS C:\ solr的-6.4.1>的java -DC = Solr_sample -Dauto =是-ddata =文件 -Drecursive =是-jar示例/ exampledocs/post.jar E:\測試\
但其中我發現一個文件沒有索引,我試圖使用下面的命令再次索引該特定文件,但沒有運氣。該文件大小爲212MB。我附上了以下錯誤。你能幫我把這個文件發佈到Solr索引。
PS C:\solr-6.4.1> java -Dc=Solr_sample -Dauto=yes -Ddata=files -Drecursive=yes -jar example/exampledocs/post.jar E:\Test\C0000000045\
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/Solr_sample/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory E:\Test\C0000000045 (1 files, depth=0)
POSTing file 20162436739-Spheres Volume 3 Foams Plural Spherology. Peter Sloterdijk. MIT.pdf (application/pdf) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #500 (Server Error) for url: http://localhost:8983/solr/Solr_sample/update/extract?resource.name=E%3A%5CTest%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Spherology.+Peter+Sloterdijk.+MIT.pdf&literal.id=E%3A%5C
Test%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Spherology.+Peter+Sloterdijk.+MIT.pdf
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 500 Server Error</title>
</head>
<body><h2>HTTP ERROR 500</h2>
<p>Problem accessing /solr/Solr_sample/update/extract. Reason:
<pre> Server Error</pre></p><h3>Caused by:</h3><pre>java.lang.OutOfMemoryError: Java heap space
at java.io.PushbackInputStream.<init>(Unknown Source)
at org.apache.pdfbox.pdfparser.InputStreamSource.<init>(InputStreamSource.java:39)
at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:55)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:821)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:727)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:652)
at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:612)
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:215)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:972)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:908)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:131)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
</pre>
</body>
</html>
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 500 for URL: http://localhost:8983/solr/Solr_sample/update/extract?resource.name=E%3A%5CTest%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Sp
herology.+Peter+Sloterdijk.+MIT.pdf&literal.id=E%3A%5CTest%5CC0000000045%5C20162436739-Spheres+Volume+3+Foams+Plural+Spherology.+Peter+Sloterdijk.+MIT.pdf
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/Solr_sample/update...
Time spent: 0:00:13.795
我用命令PS C:\ Solr的-6.4.1> java的-Xmx8g - Dc = Solr_sample -Dauto = yes -Ddata = files -Drecursive = yes -jar example/exampledocs/post.jar E:\ Test \ C0000000045 \但它仍然給出相同的錯誤。 – Simant
你有多少公羊免費?儘可能多地給它。但是你不能排除PDF文本提取中的一個錯誤,你可能永遠不會從文本中提取文本(或者你可以通過其他方式,然後創建一個DOC文件與人工手動/腳本,如果它是重要的) – Persimmonium
錯誤消息在服務器上生成,給索引工具更多的內存將無濟於事。您可以更改/使用您正在使用的Solr啓動腳本的已分配內存的大小。 – MatsLindh