Nutch 1.3和Solr 4.4.0集成作業失敗

我想用nutch來抓取網頁，我按照nutch官方網站上的文檔步驟（成功運行抓取，將scheme-solr4.xml複製到solr目錄中）。但是當我運行Nutch 1.3和Solr 4.4.0集成作業失敗

bin/nutch solrindex http://localhost:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*

我得到以下錯誤：

Indexer: starting at 2013-08-25 09:17:35 
Indexer: deleting gone documents: false 
Indexer: URL filtering: false 
Indexer: URL normalizing: false 
Active IndexWriters : 
SOLRIndexWriter 
    solr.server.url : URL of the SOLR instance (mandatory) 
    solr.commit.size : buffer size when sending to SOLR (default 1000) 
    solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) 
    solr.auth : use authentication (default false) 
    solr.auth.username : use authentication (default false) 
    solr.auth : username for authentication 
    solr.auth.password : password for authentication 


Indexer: java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) 
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123) 
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)

我不得不提的是，Solr的運行，但我無法瀏覽http://localhost:8983/solr/admin（它重定向我http://localhost:8983/solr/#）。

另一方面，當我停止solr，我得到同樣的錯誤！有人知道我的設置有什麼問題嗎？

P.S.我抓取網址是：http://localhost/NORC

來源

2013-08-25 orezvani

是你能解決這個問題嗎？ – Monodeep

檢查您的配置對：Solr和Nutch

的Nutch和Solr的架構文件應該是相同的或可能遇到的問題，以便確保他們匹配

來源

2013-08-25 22:04:24

當我在nutch遇到同樣的問題，solr的日誌中會出現一條錯誤信息「未知的現場主機」。修改了solr中的schema.xml後，nutch的錯誤消失了。

來源

2014-01-09 01:39:27 billni

您在命令中缺少核心的名稱。

例如爲：

./bin/crawl -i -D solr.server.url=http://localhost:8983/solr/#/your_corname urls/ crawl 1

來源

2016-02-25 11:17:04 merlin

Nutch 1.3和Solr 4.4.0集成作業失敗

回答

相關問題