1
我試圖設置Apache Nutch和Apache Solr,因此我們的網站可以進行內部網站搜索。我遵循我的指導,儘管他們非常有用,但如果發生錯誤並且大多數看起來已經過時,他們缺乏做什麼。在Centos上設置Nutch與Solr
我使用JDK 131,Nutch的2.3.1,和Solr 6.5.1
這從沒有root用戶我的行動順序
sudo wget [java url] to /opt
sudo tar xvf java.tar.gz
export JAVA_HOME=/opt/java/
export JAVA_JRE=/opt/java/jre
export PATH=$PATH:/opt/java/bin:/opt/java/jre/bin
cd solr6.5.1/
sudo start runtime -e cloud -noprompt
sudo wget [solr url] to /root
sudo tar xvf solr.tar.gz
sudo wget [nutch url] to /opt
sudo tar xvf nutch.tar.gz
cd /opt/apache-nutch-2.3.1
sudo vi nutch-site.xml
地址:
<configuration>
<property>
<name>http.agent.name</name>
<value>nutch-solr-integration</value>
</property>
<property>
<name>generate.max.per.host</name>
<value>100</value>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|indexer-solr|urlnormalizer-(pass|regex|basic)</value>
<description> At the very least, I needed to add the parse-html, urlfilter-regex, and the indexer-solr.
</description>
</property>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.sql.store.SqlStore</value>
<description>The Gora DataStore class for storing and retrieving data.</description>
</property>
</configuration>
cd /opt/apache-nutch-2.3.1
mkdir urls
cd urls
sudo vi seed.txt
add [our site url]
[ESC]
:w
:q
cd ../conf
sudo vi regex-urlfilter.xml
add:
+^http://([a-zA-Z0-9]*\.)*[domain of our site].com/
[ESC]
:w
:q
cd ..
sudo ant runtime
sudo -E runtime/local/bin/nutch inject urls -crawlId 3
然後我得到這個:
InjectorJob: Injecting urlDir: urls
InjectorJob: java.lang.ClassNotFoundException: org.apache.gora.sql.store.SqlStore
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:93)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:77)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
我的問題是如何得到這個錯誤,以及如何解決它。我在很多地方看到要修改schema.xml solr目錄,但solr目錄中沒有任何schema.xml文件。
解決了注入url。生成代碼 ** sudo -E運行時/ local/bin/nutch生成-topN 10 ** GeneratorJob:在2017-05-30 11:33:08開始發動機工作:選擇最佳 - 取得抓取的到期網址。 GeneratorJob:開始 GeneratorJob:過濾:真 GeneratorJob:正火:真 GeneratorJob:TOPN:螺紋10 異常 「主要」 java.lang.NoClassDefFoundError:組織/阿帕奇/的Hadoop/HBase的/ HBaseConfiguration –
您使用HBase的作爲數據存儲? –
我相信是的。我安裝了它,並按照這些說明[鏈接](https://anil.io/blog/apache/nutch/apache-nutch-2-3-hbase-0-94-14-and-solr-5-2-1 -tutorial /) –