2013-07-31 65 views
1

當我執行Nutch的命令來創建crawldb文件夾和內容:Nutch的:org.apache.hadoop.mapreduce.lib.input.InvalidInputException:輸入路徑不存在

[email protected] /usr/local/apache-nutch-2.2.1/runtime/local 
$ bin/nutch crawl urls -dir crawl -depth 3 -topN 5 

我得到這個錯誤:

InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class. 
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/C:/cygwin/usr/local/apache-nutch-2.2.1/runtime/local/crawl 
     at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224) 
     at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241) 
     at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) 
     at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) 
     at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) 
     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) 
     at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50) 
     at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233) 
     at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) 
     at org.apache.nutch.crawl.Crawler.run(Crawler.java:136) 
     at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) 
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
     at org.apache.nutch.crawl.Crawler.main(Crawler.java:257) 

我使用apache-nutch-2.2.1,hadoop-0.20.2-core.jar,hbase-0.90.4.jar和CygWin設置2.774。

我沒有安裝hadoop,只有hadoop libary內部安裝,因此不是一個分佈式,但本地nutch設置。

有什麼想法? 在此先感謝!

編輯:

當手動創建目錄,我得到另一個錯誤:

[email protected] /usr/local/apache-nutch-2.2.1/runtime/local 
$ mkdir crawl 

[email protected] /usr/local/apache-nutch-2.2.1/runtime/local 
$ chmod 777 crawl 

[email protected] /usr/local/apache-nutch-2.2.1/runtime/local 
$ bin/nutch crawl urls -dir crawl -depth 3 -topN 5 
cygpath: can't convert empty path 
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class. 
Exception in thread "main" java.lang.RuntimeException: job failed: name=inject crawl, jobid=null 
     at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54) 
     at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233) 
     at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) 
     at org.apache.nutch.crawl.Crawler.run(Crawler.java:136) 
     at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) 
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
     at org.apache.nutch.crawl.Crawler.main(Crawler.java:257) 
+0

你有沒有想過這個? – gsingh2011

+0

這個版本似乎不贊成使用「nutch crawl」命令。請使用「抓取」腳本,例如:「bin/crawl url crawl 1」。 [參考這裏](http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script) – Osy

回答

0

如果你想使用-dir crawl,你需要首先創建該文件夾file:/C:/cygwin/usr/local/apache-nutch-2.2.1/runtime/local/crawl

+0

我試過這之前,我有另一個錯誤:java.lang.RuntimeException:作業失敗:名稱=注入爬行,jobid = null – Osy

+0

您可以粘貼堆棧跟蹤嗎? – zsxwing

+0

查看問題編輯。 – Osy

相關問題