2012-09-11 28 views
2

請幫助我們 我試圖抓取使用NUTCH的網站,但它給了我錯誤「java.io.IOException: Job failed!org.apache.solr.common.SolrException:錯誤的請求錯誤的請求請求:http:// localhost:8080/solr/update?wt = javabin&version = 2

我正在運行此命令「bin/nutch solrindex http://<host name>:8080/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*」,我使用NUTCH 1.5.1和SOLR 3.6.1和jdk java-7-openjdk-i386和ubuntu 12.04。

在hadoop.log內部存在的Nutch /日誌文件夾顯示以下內容:

2012-09-13 12:56:10,524 INFO solr.SolrIndexer - SolrIndexer: starting at 2012-09-13 12:56:10 

2012-09-13 12:56:10,604 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb 

2012-09-13 12:56:10,604 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb 

2012-09-13 12:56:10,604 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20120910160403 

2012-09-13 12:56:10,711 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20120910160448 

2012-09-13 12:56:10,715 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20120910160631 

2012-09-13 12:56:10,760 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 

2012-09-13 12:56:11,212 INFO plugin.PluginRepository - Plugins: looking in: /home/zapbuild/Nutch/plugins 

2012-09-13 12:56:11,310 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 

2012-09-13 12:56:11,310 INFO plugin.PluginRepository - Registered Plugins: 

2012-09-13 12:56:11,310 INFO plugin.PluginRepository -  the nutch core extension points (nutch-extensionpoints) 

2012-09-13 12:56:11,310 INFO plugin.PluginRepository -  Regex URL Normalizer (urlnormalizer-regex) 

2012-09-13 12:56:11,310 INFO plugin.PluginRepository -  CyberNeko HTML Parser (lib-nekohtml) 

2012-09-13 12:56:11,310 INFO plugin.PluginRepository -  OPIC Scoring Plug-in (scoring-opic) 

2012-09-13 12:56:11,310 INFO plugin.PluginRepository -  Basic URL Normalizer (urlnormalizer-basic) 

2012-09-13 12:56:11,310 INFO plugin.PluginRepository -  Tika Parser Plug-in (parse-tika) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Basic Indexing Filter (index-basic) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Html Parse Plug-in (parse-html) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Anchor Indexing Filter (index-anchor) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  HTTP Framework (lib-http) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Regex URL Filter (urlfilter-regex) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Regex URL Filter Framework (lib-regex-filter) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Pass-through URL Normalizer (urlnormalizer-pass) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Http Protocol Plug-in (protocol-http) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Registered Extension-Points: 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Nutch Protocol (org.apache.nutch.protocol.Protocol) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Nutch Segment Merge Filter (org.apache.nutch.segment.SegmentMergeFilter) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Nutch URL Filter (org.apache.nutch.net.URLFilter) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Nutch Content Parser (org.apache.nutch.parse.Parser) 

2012-09-13 12:56:11,311 INFO plugin.PluginRepository -  Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 

2012-09-13 12:56:11,313 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 


2012-09-13 12:56:11,314 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:11,314 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:14,104 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:14,104 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:14,104 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:17,135 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:17,136 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:17,136 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:20,204 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:20,205 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:20,205 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:23,297 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:23,297 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:23,297 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:26,232 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:26,232 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:26,233 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:29,252 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:29,252 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:29,252 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:32,284 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:32,284 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:32,284 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:35,258 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:35,258 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:35,258 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:38,283 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:38,284 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:38,284 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:41,278 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:41,278 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:41,278 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:44,334 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:44,334 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:44,334 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:47,338 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:47,338 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:47,338 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:50,360 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:50,360 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:50,360 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:53,309 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 

2012-09-13 12:56:53,310 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 

2012-09-13 12:56:53,310 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 

2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: content dest: content 

2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: title dest: title 

2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: host dest: host 

2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: segment dest: segment 

2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: boost dest: boost 

2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: digest dest: digest 

2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 

2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: url dest: id 

2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: url dest: url 

2012-09-13 12:56:53,409 INFO solr.SolrWriter - Indexing 18 documents 

2012-09-13 12:56:53,604 WARN mapred.LocalJobRunner - job_local_0001 

org.apache.solr.common.SolrException: Missing solr core name in path 

Missing solr core name in path 

request: http://<host name>:8983/solr/update?wt=javabin&version=2 
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) 
    at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) 
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) 
    at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:142) 
    at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) 
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:466) 
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:530) 
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) 
2012-09-13 12:56:53,981 ERROR solr.SolrIndexer - java.io.IOException: Job failed! 

而且Solr中我沒有找到任何日誌文件。

請幫我解決這個問題,我真的堅持這一點。

+0

你需要看看Solr日誌文件,其中包含錯誤。可能某些必填字段丟失。 – javanna

+0

您可以添加Solr輸出以便我們可以幫助您嗎? – javanna

回答

2

你的日誌說是什麼問題: Missing solr core name in path

你的要求應該有/solr//update?wt=...

像這樣的東西之間的Solr核心的名字: http://<host name>:8983/solr/<core_name>/update?wt=javabin&version=2

也許你應該補充核心名字寫入你的nutch命令的URL