2
我是Nutch的初學者。嘗試了一些教程,從NutchWiki爬網。然後我嘗試在this的幫助下製作一個自定義插件進行解析。 所有的配置和建設後使用ant
我的插件文件夾中有build/plugins
和runtime/local/plugin
和apache-nutch-1.13-SNAPSHOT.job
文件。當我解析獲取的內容時,出現以下錯誤。java.lang.RuntimeException:org.apache.nutch.plugin.PluginRuntimeException:使用nutch解析時的java.lang.ClassNotFoundException
Error parsing: http://example.com/: java.lang.RuntimeException: org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.parsefilter.TagExtractorParseFilter
at org.apache.nutch.plugin.PluginRepository.getOrderedPlugins(PluginRepository.java:469)
at org.apache.nutch.parse.HtmlParseFilters.<init>(HtmlParseFilters.java:35)
at org.apache.nutch.parse.html.HtmlParser.setConf(HtmlParser.java:340)
at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:163)
at org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:136)
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:78)
at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:107)
at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:45)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.parsefilter.TagExtractorParseFilter
at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:167)
at org.apache.nutch.plugin.PluginRepository.getOrderedPlugins(PluginRepository.java:441)
... 16 more
Caused by: java.lang.ClassNotFoundException: org.apache.nutch.parsefilter.TagExtractorParseFilter
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.nutch.plugin.PluginRepository.getCachedClass(PluginRepository.java:331)
at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156)
... 17 more
我無法確定問題究竟是什麼,我完成了教程中指定的所有內容。 任何幫助將不勝感激。
編輯:
CLASSPATH="${CLASSPATH}:$NUTCH_HOME/plugins/TagExtractorParseFilter/TagExtractorParseFilter.jar"
# distributed mode
EXEC_CALL=(hadoop jar "$NUTCH_JOB")
if $local; then
EXEC_CALL=("$JAVA" $JAVA_HEAP_MAX "${NUTCH_OPTS[@]}" -classpath "$CLASSPATH")
else
.....................
您是否在「plugin.includes」字段中的nutch-site.xml中添加了插件? –
是的。我補充說。目前我只是通過硬編碼文件的類路徑來避免錯誤。但是,這個問題實際上並沒有解決。 – Abhishek
你正在使用哪個版本的Nutch? – Anup