當我執行Nutch的命令來創建crawldb文件夾和內容:Nutch的:org.apache.hadoop.mapreduce.lib.input.InvalidInputException:輸入路徑不存在
[email protected] /usr/local/apache-nutch-2.2.1/runtime/local
$ bin/nutch crawl urls -dir crawl -depth 3 -topN 5
我得到這個錯誤:
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/C:/cygwin/usr/local/apache-nutch-2.2.1/runtime/local/crawl
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
我使用apache-nutch-2.2.1,hadoop-0.20.2-core.jar,hbase-0.90.4.jar和CygWin設置2.774。
我沒有安裝hadoop,只有hadoop libary內部安裝,因此不是一個分佈式,但本地nutch設置。
有什麼想法? 在此先感謝!
編輯:
當手動創建目錄,我得到另一個錯誤:
[email protected] /usr/local/apache-nutch-2.2.1/runtime/local
$ mkdir crawl
[email protected] /usr/local/apache-nutch-2.2.1/runtime/local
$ chmod 777 crawl
[email protected] /usr/local/apache-nutch-2.2.1/runtime/local
$ bin/nutch crawl urls -dir crawl -depth 3 -topN 5
cygpath: can't convert empty path
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
Exception in thread "main" java.lang.RuntimeException: job failed: name=inject crawl, jobid=null
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
你有沒有想過這個? – gsingh2011
這個版本似乎不贊成使用「nutch crawl」命令。請使用「抓取」腳本,例如:「bin/crawl url crawl 1」。 [參考這裏](http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script) –
Osy