2016-05-26 52 views
1

我正在整合Nutch與Hbase和Solr。集成Apache Nutch 2.3與Hbase 0.94.14和Solr 5.2.1的錯誤

開始Hadoop和HBase的服務後,我跑在Nutch的主頁下面的命令

sudo -E bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/ 2

我面對這些錯誤:

Injecting seed URLs 
/usr/local/apache-nutch-2.3.1/runtime/local/bin/nutch inject urls/seed.txt -crawlId TestCrawl 
InjectorJob: starting at 2016-05-26 15:41:14 
InjectorJob: Injecting urlDir: urls/seed.txt 
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration 
    at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:114) 
    at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) 
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161) 
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) 
    at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78) 
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218) 
    at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252) 
    at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284) 
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    ... 10 more 
Error running: 
    /usr/local/apache-nutch-2.3.1/runtime/local/bin/nutch inject urls/seed.txt -crawlId TestCrawl 
Failed with exit value 1. 

任何人都可以建議我有什麼錯呢?

回答

2

這是Nutch中的一個錯誤,它在執行爬網腳本時無法找到傳遞依賴項。

更好的配置使用的是Nutch的-2.3.1的HBase-0.98.8-hadoop2

更好地理解下面的網址是指給定

https://wiki.apache.org/nutch/Nutch2Tutorial

這是山中的錯誤-hbase 0.6.1

另外添加缺少的hbase-common-0.98.8-hadoop2.jar傳遞依賴,這是gora-hbase中的一個bug 0.6.1

<dependency org="org.apache.hbase" name="hbase-common" rev="0.98.8-hadoop2" conf="*->default" /> 

有了這個我能夠成功抓取。

相關問題