2012-06-22 27 views
6

我使用Nutch的開始,一切都很好,直到我遇到了一個IOException例外,使用Nutch的...爬行顯示一個IOException

$ ./nutch crawl urls -dir myCrawl -depth 2 -topN 4 
cygpath: can't convert empty path 
solrUrl is not set, indexing will be skipped... 
crawl started in: myCrawl 
rootUrlDir = urls 
threads = 10 
depth = 2 
solrUrl=null 
topN = 4 
Injector: starting at 2012-06-23 03:37:51 
Injector: crawlDb: myCrawl/crawldb 
Injector: urlDir: urls 
Injector: Converting injected urls to crawl db entries. 
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Rahul\mapred\staging\Rahul255889423\.staging to 0700 
    at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682) 
    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655) 
    at  org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) 
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) 
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) 
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) 
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) 
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824) 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261) 
    at org.apache.nutch.crawl.Injector.inject(Injector.java:217) 
    at org.apache.nutch.crawl.Crawl.run(Crawl.java:127) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) 

@jeffery ---我降級我Nutch的版本n遇到了新的問題,這是我的範圍,以瞭解.... Plzz幫助....

$ ./nutch crawl urls -dir myCrawl -depth 4 -topN 5 
cygpath: can't convert empty path 
solrUrl is not set, indexing will be skipped... 
crawl started in: myCrawl 
root UrlDir = urls 
threads = 10 
depth = 4 
solrUrl=null 
topN = 5 
Injector: starting at 2012-06-23 22:30:28 
Injector: crawlDb: myCrawl/crawldb 
Injector: urlDir: urls 
Injector: Converting injected urls to crawl db entries. 
Exception in thread "main" java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) 
    at org.apache.nutch.crawl.Injector.inject(Injector.java:217) 
    at org.apache.nutch.crawl.Crawl.run(Crawl.java:127) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) 

這TYM什麼問題???

+0

您正在使用哪個版本的Nutch/Hadoop? – Jeffrey

+0

Nutch-1.5 Solr-3.5 –

+0

不知道abt hadoop。 我是使用Nutch的完整noob。 :( –

回答

0

幾天前我遇到了這個問題。當涉及到與Windows進行交互時,Hadoop的較新版本會遇到麻煩。你可以切換到一個* nix平臺(你可能應該這樣做,幾乎所有對Nutch的支持都是針對* nix用戶的),或者降級你的Nutch版本。我發現在Windows Server 2008上運行的Nutch的最新版本是1.2

+0

thnx的信息.... 如果我在Windows下將版本降級到1.2,與在linux環境下使用最新版本的nutch相比,它會受到任何形式的限制嗎? –

+0

@prafulbagai您將不會擁有任何新功能。所有的官方教程面向1.5,所以可能會有一些差異。您仍然可以抓取,解析和索引,因此根據您的需求的確切性質,它可能沒問題。 – Jeffrey

+0

Thnx Jeffery !!!!要嘗試笏你推薦! 1最後一個問題..這可能聽起來有點愚蠢,但如果我想回顧一下Nutch的定製它的源代碼,我可以從哪裏得到它? –