我目前正面臨一個SOLR問題(更確切地說是與奴隸複製),並花了很多時間在網上閱讀後,我發現自己不得不尋求一些啓發。Solr索引 - 主/從複製,如何處理巨大的索引和高流量?
- Solr的索引大小是否有限制?
在處理單個主設備時,何時決定使用多核還是多索引? 有什麼跡象表明在達到一定大小的索引時,建議進行分區?
- 從主控制器到從屬控制器複製段時是否有最大尺寸?
複製時,從站是否無法下載內容並對其進行索引時是否存在段大小限制?當大量流量檢索信息和許多新文檔複製時,從屬將無法複製的閾值是多少。
事實上,下面是導致我遇到這些問題的上下文: 我們想索引大量的文檔,但是當數量達到十幾百萬時,奴隸無法處理它,用SnapPull錯誤開始失敗複製。 這些文檔由幾個文本字段組成(名稱,類型,描述,...約10個其他字段,比如說最多20個字符)。
我們有一個master和2個從master複製數據的slave。
這是我第一次使用Solr(我通常在使用spring,hibernate的webapps上工作,但沒有使用Solr),所以我不知道如何解決這個問題。
我們的想法是,現在向主控添加多個內核,並從每個內核複製一個從控制器。 這是正確的路嗎?
如果是這樣,我們如何確定需要的內核數量?現在我們只是試圖瞭解它的行爲和必要性,但我想知道是否有關於這個特定主題的最佳實踐或某些基準。
對於這一數額與該平均大小的文件,X需要內核或索引...
感謝您對我怎麼能應付龐大的平均大小的文件量的任何幫助!
以下是錯誤的副本被拋出時,從試圖複製:
ERROR [org.apache.solr.handler.ReplicationHandler] - <SnapPull failed >
org.apache.solr.common.SolrException: Index fetch failed :
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:135)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:65)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:142)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:166)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.lang.RuntimeException: java.io.IOException: read past EOF
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:467)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
... 11 more
Caused by: java.io.IOException: read past EOF
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70)
at org.apache.lucene.index.SegmentInfos$2.doBody(SegmentInfos.java:410)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:538)
at org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:402)
at org.apache.lucene.index.DirectoryReader.isCurrent(DirectoryReader.java:791)
at org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:404)
at org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:352)
at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:413)
at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:424)
at org.apache.solr.search.SolrIndexReader.reopen(SolrIndexReader.java:35)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1049)
... 14 more
編輯: 毛的回答後,Solr的庫已更新到1.4。 1但這個錯誤仍然被提出。 我增加了commitReserveDuration而且即使「SnapPull失敗」的錯誤似乎已經消失,一個又一個開始被提出,不知道爲什麼看起來,我不能在網上找到很多答案:
ERROR [org.apache.solr.servlet.SolrDispatchFilter] - <ClientAbortException: java.io.IOException
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:370)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:323)
at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:396)
at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:385)
at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89)
at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:183)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWriter.java:48)
at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:322)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:837)
at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:640)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1286)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.io.IOException
at org.apache.coyote.http11.InternalAprOutputBuffer.flushBuffer(InternalAprOutputBuffer.java:703)
at org.apache.coyote.http11.InternalAprOutputBuffer$SocketOutputBuffer.doWrite(InternalAprOutputBuffer.java:733)
at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124)
at org.apache.coyote.http11.InternalAprOutputBuffer.doWrite(InternalAprOutputBuffer.java:539)
at org.apache.coyote.Response.doWrite(Response.java:560)
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:365)
... 22 more
>
ERROR [org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/].[SolrServer]] - <Servlet.service() for servlet SolrServer threw exception>
java.lang.IllegalStateException
at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:405)
at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:362)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:837)
at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:640)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1286)
at java.lang.Thread.run(Thread.java:595)
我仍然想知道如何處理包含大量solr文檔的大型索引(超過20G)的最佳實踐。我在某處丟失了一些明顯的鏈接嗎?教程,文件?
您使用了哪種複製機制?基於java還是腳本? – Karussell 2010-11-17 08:41:29
一切都在Java中完成(solr http複製)。主設備是數據加載器並索引所有內容,而從設備則複製主設備。當我們添加一堆文檔時,發生了第一個錯誤「SnapPuller失敗」。大師處理它很好,但奴隸開始失敗,據說由於突然更大的細分市場複製。這就是爲什麼我要問什麼是最好的方法來擴大solr,這個文件是有幫助的:http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene - 和 - Solr#d0e410 – 2010-11-19 21:51:43