2014-07-01 71 views
1

我試圖抓取使用Nutch的1.8和Solr 4.8在網絡上的Windows 7Nutch的1.8和Apache Solr實現4.8集成作業失敗

bin/crawl urls newsolr http://localhost:8983/solr/ 1 -depth 1 

我不斷收到以下錯誤

Indexer: java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) 
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) 
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) 

這裏是日誌文件的一部分:

2014-07-01 16:58:33,613 INFO solr.SolrMappingReader - source: content dest: content 
2014-07-01 16:58:33,613 INFO solr.SolrMappingReader - source: title dest: title 
2014-07-01 16:58:33,613 INFO solr.SolrMappingReader - source: host dest: host 
2014-07-01 16:58:33,613 INFO solr.SolrMappingReader - source: segment dest: segment 
2014-07-01 16:58:33,613 INFO solr.SolrMappingReader - source: boost dest: boost 
2014-07-01 16:58:33,613 INFO solr.SolrMappingReader - source: digest dest: digest 
2014-07-01 16:58:33,613 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 
2014-07-01 16:58:33,613 INFO solr.SolrMappingReader - source: url dest: id 
2014-07-01 16:58:33,613 INFO solr.SolrMappingReader - source: url dest: url 
2014-07-01 16:58:33,643 INFO solr.SolrIndexWriter - Indexing 1 documents 
2014-07-01 16:58:33,773 WARN mapred.LocalJobRunner - job_local_0001 
org.apache.solr.common.SolrException: Method Not Allowed 

Method Not Allowed 

request: http://localhost:8983/solr/ 
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) 
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) 
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) 
    at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155) 
    at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118) 
    at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44) 
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) 
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 
2014-07-01 16:58:34,628 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) 
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) 
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)` 

最後,Solr的錯誤日誌:

`org.apache.solr.common.SolrException: ERROR: [doc=http://.com/] unknown field 'tstamp' ` 

這是我第一次的Solr/Nutch的設置。任何幫助是極大的讚賞。先謝謝了!

+0

請說你正在做的究竟是什麼,你在整件事情的設置到目前爲止做了什麼,你已經試圖解決 –

+0

我試圖用Nutch的抓取網站並將其加載到Solr的問題是什麼。我一直在使用Cygwyin,和Solr 4.8使用Windows 7的Nutch和Solr的Heliosearch發行工作正確安裝Nutch的1.8,但是當我使用的Nutch包括Solr的實例中運行抓取(本地主機:8983),然後我得到上面提到的錯誤。將Nutch schema-solr4.xml文件複製到Solr不起作用。我還研究了未知字段'tstamp'錯誤,並嘗試將schema.xml中的字段修改爲。 – akorkosz

+0

而不是原始的,但似乎沒有任何工作。 – akorkosz

回答

0

只要停止Solr的實例,並再次啓動它。它應該解決你的問題。因爲你所做的更改架構文件和要保存的變化,因此Solr的是無法「看到」新添加的字段順序沒有重新啓動Solr的發生 錯誤。