2016-08-11 97 views
2

我跟着https://wiki.apache.org/nutch/NutchTutorial並試圖安裝和集成Nutch 1.12與Solr 5.5.2。我按照教程中提到的步驟安裝了Nutch,但是嘗試通過運行下面的命令與solr集成。它拋出了下面的例外。Nutch 1.12 exception java.io.IOException:No FileSystem for scheme:http

倉/ Nutch的索引http://10.209.18.213:8983/solr爬行/ crawldb/-linkdb爬行/ linkdb /爬行/分段/ * -filter -normalize

Exception 

2016-08-11 09:18:40,076 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
2016-08-11 09:18:40,383 WARN segment.SegmentChecker - The input path at crawldb is not a segment... skipping 
2016-08-11 09:18:40,397 INFO segment.SegmentChecker - Segment dir is complete: crawl/segments/20160810110110. 
2016-08-11 09:18:40,403 INFO segment.SegmentChecker - Segment dir is complete: crawl/segments/20160810112551. 
2016-08-11 09:18:40,408 INFO segment.SegmentChecker - Segment dir is complete: crawl/segments/20160810112952. 
2016-08-11 09:18:40,409 INFO indexer.IndexingJob - Indexer: starting at 2016-08-11 09:18:40 
2016-08-11 09:18:40,415 INFO indexer.IndexingJob - Indexer: deleting gone documents: false 
2016-08-11 09:18:40,415 INFO indexer.IndexingJob - Indexer: URL filtering: true 
2016-08-11 09:18:40,415 INFO indexer.IndexingJob - Indexer: URL normalizing: true 
2016-08-11 09:18:40,672 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 
2016-08-11 09:18:40,672 INFO indexer.IndexingJob - Active IndexWriters : 
SOLRIndexWriter 
     solr.server.url : URL of the SOLR instance 
     solr.zookeeper.hosts : URL of the Zookeeper quorum 
     solr.commit.size : buffer size when sending to SOLR (default 1000) 
     solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) 
     solr.auth : use authentication (default false) 
     solr.auth.username : username for authentication 
     solr.auth.password : password for authentication 


2016-08-11 09:18:40,677 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: http://10.209.18.213:8983/solr 
2016-08-11 09:18:40,677 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb 
2016-08-11 09:18:40,677 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20160810110110 
2016-08-11 09:18:40,683 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20160810112551 
2016-08-11 09:18:40,684 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20160810112952 
2016-08-11 09:18:41,362 ERROR indexer.IndexingJob - Indexer: java.io.IOException: No FileSystem for scheme: http 
     at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385) 
     at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392) 
     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) 
     at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) 
     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) 
     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) 
     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) 
     at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256) 
     at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) 
     at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45) 
     at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) 
     at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) 
     at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) 
     at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) 
     at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) 
     at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:415) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) 
     at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) 
     at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) 
     at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:415) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) 
     at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) 
     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) 
     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833) 
     at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145) 
     at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228) 
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
     at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237) 
+0

我有同樣的問題。你有沒有找到解決辦法? – LucaoA

回答

-2

tutorial仍然提到棄用solrindex命令。該指數命令應該是

bin/nutch index -Dsolr.server.url=http://.../solr crawldb/ -linkdb linkdb/ segments/* 

沒有參數Nutch的命令顯示命令行幫助:

bin/nutch index 
Usage: Indexer <crawldb> [-linkdb <linkdb>] [-params k1=v1&k2=v2...] (<segment> ... | -dir <segments>) [-noCommit] [-deleteGone] [-filter] [-normalize] [-addBinaryContent] [-base64] 
Active IndexWriters : 
SOLRIndexWriter 
     solr.server.url : URL of the SOLR instance 
     solr.zookeeper.hosts : URL of the Zookeeper quorum 
     solr.commit.size : buffer size when sending to SOLR (default 1000) 
     solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) 
     solr.auth : use authentication (default false) 
     solr.auth.username : username for authentication 
     solr.auth.password : password for authentication 
相關問題