2015-05-29 98 views
0

我是nutch和solr集成中的新成員。錯誤:org.apache.hadoop.mapred.InvalidInputException:輸入路徑不存在

我想抓取新的網址,所以我安裝了Solr的版本4.6.0和Nutch的1.6版在ubuntu.First我開始與一些配置,但我仍然得到這個錯誤:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: File:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_fetch

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin /20150529030452/crawl_parse

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_data

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_text

在文件記錄我得到這個錯誤:

2015-05-29 03:05:41,153 ERROR security.UserGroupInformation -PriviledgedActionException as:cloudera

cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_fetch

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_parse

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_data

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_text

2015-05-29 03:05:41,153 ERROR solr.SolrIndexer - org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_fetch

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_parse

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_data

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_text

請告訴我這是什麼意思,你能不能解釋一下什麼的問題,我怎麼能解決這個問題。

我將非常感謝您的幫助。

回答

1

如果您使用Mac OS或任何基於Unix的操作系統(如FreeBSD)中的bin/crawl,請切換到Ubuntu。我相信這是一個爬行腳本的錯誤。我之前遇到過這種情況,而使用Ubuntu。

相關問題