我想運行一個基本的爬行按照該NutchTutorial:不能獲得Apache Nutch的抓取 - 權限和JAVA_HOME懷疑
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
所以我Nutch的所有安裝和設置使用Solr。我在我的.bashrc
中將我的$ JAVA_HOME設置爲/usr/lib/jvm/java-1.6.0-openjdk-amd64
。
我看不出有什麼問題,當我從Nutch的主目錄下運行bin/nutch
,但是當我嘗試運行爬如上我得到以下錯誤:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /usr/share/nutch/logs/hadoop.log (Permission denied)
at java.io.FileOutputStream.openAppend(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:207)
at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:290)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:164)
at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:216)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:257)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:133)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:97)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:689)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:647)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:544)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:440)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:476)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:471)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:125)
at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:73)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:270)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:281)
at org.apache.nutch.crawl.Crawl.<clinit>(Crawl.java:43)
log4j:ERROR Either File or DatePattern options are not set for appender [DRFA].
solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
solrUrl=null
topN = 5
Injector: starting at 2013-06-28 16:24:53
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 0
Injector: total number of urls injected after normalization and filtering: 1
Injector: Merging injected urls into crawl db.
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.crawl.Injector.inject(Injector.java:296)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:132)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
我懷疑它可能有一些做文件的權限,因爲我必須在此服務器上幾乎所有運行sudo
,但如果我運行sudo
一樣爬行的命令,我得到:
Error: JAVA_HOME is not set.
所以我覺得我已經有了一個catch-22這裏正在發生。我是否應該能夠使用sudo
運行此命令,還是有其他需要做的事情,以便我不必使用sudo
來運行它,它會起作用,還是還有其他的事情正在完成呢?
現在應該工作。非常感謝。 – roy
Mehul。謝謝!像魅力一樣工作。 – sunskin