Apache Solr 4 - 第一次提交後，索引不增長

我已經編寫了自己的Apache Nutch 2.2.1插件，用於從選定網站（我的種子中有180個URL）抓取圖像，視頻和播客。我把這個元數據放到一個hbase存儲中，現在我想把它保存到索引（Solr）中。我有很多元數據保存（網頁+圖片+視頻+播客）。我使用Nutch腳本斌/爬行整個過程（注入，生成，獲取，解析...和最後solrindex和重複數據刪除），但我有一個問題。當我第一次運行這個腳本時，存儲大約6000個文檔（比如圖像爲3700個文檔，wegpages爲1700個，其他文檔爲視頻和podcasts）。它是確定...Apache Solr 4 - 第一次提交後，索引不增長

但是......

當我運行該腳本，第二次，第三次...等指標也不會增加文件的數量（目前仍有6000文件），但存儲在hBase表中的行數增加（現在有97383行）...

你現在問題在哪裏？我與這個問題很長時間的鬥爭，我不知道...如果這可能是有益的，這是我的配置solrconfix.xml http://pastebin.com/uxMW2nuq，這是我的nutch-site.xml http://pastebin.com/4bj1wdmT

當我看着日誌，有：

SEVERE: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit 
     at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2668) 
     at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2834) 
     at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2814) 
     at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:529) 
     at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) 
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) 
     at java.util.concurrent.FutureTask.run(FutureTask.java:166) 
     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) 
     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:722)

來源

2013-07-14 Jan Bouchner

您是否在自動提交之前嘗試使用較低的值？嘗試提交每100個文檔以避免內存中的信息太多。 – MatsLindh

謝謝你，另外。這是問題所在。 –

我已添加評論作爲答案，所以您可以接受它。感謝您晚點跟進。 :-) – MatsLindh

你有沒有在自動提交之前用較低的值嘗試？嘗試提交每100個文檔以避免內存中的信息太多。

來源

2013-11-16 23:52:21 MatsLindh

Apache Solr 4 - 第一次提交後，索引不增長

回答

相關問題