2014-07-10 109 views
0

我嘗試激活nutch 1.8中的標題插件,但不知何故它不起作用。這裏是我的nutch-site.xml的部分:激活nutch標題插件的問題

<property> 
    <name>plugin.includes</name> 
    <value>protocol-http|urlfilter-regex|parse-(html|tika|metatags|headings)|index-(basic|anchor|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)</value> 
    <description>activates metatag parsing </description> 
</property> 

<property> 
    <name>headings</name> 
    <value>h1;h2</value> 
    <description>Comma separated list of headings to retrieve from the document</description> 
</property> 

<property> 
    <name>headings.multivalued</name> 
    <value>false</value> 
    <description>Whether to support multivalued headings.</description> 
</property> 

<property> 
<name>index.parse.md</name> 
<value>metatag.description,metatag.title, metatag.keywords, metatag.author, 
metatag.author, headings.h1, headings.h2</value> 
<description> Comma-separated list of keys to be taken from the parse metadata to generate fields. Can be used e.g. for 'description' or 'keywords' provided that these values are generated by a parser (see parse-metatags plugin) 
</description> 
</property> 

有人可以幫忙嗎?

感謝克里斯

回答

0

<name>index.parse.md</name> 

檢查metatag.h1和metatag.h2

<property> 
    <name>index.parse.md</name> 
    <value>metatag.h1,metatag.h2/value> 
    ... 

BTW。標題不是解析-...過濾器。 你必須使用

<name>plugin.includes</name> 
<value>headings|parse-(html|tika|metatags)|... 

現在,它應該工作...

0

與此struggeling後我自己,我已經找到了以下應工作(Apache的Nutch的1.9):

<property> 
    <name>plugin.includes</name> 
    <value>protocol-http|headings|parse-(html|tika|metatags)|...</value> 
    </property> 
    <property> 
    <name>index.parse.md</name> 
    <value>h1,h2,h3</value> 
    </property> 
    <property> 
    <name>headings</name> 
    <value>h1,h2,h3</value> 
    </property> 
    <property> 
    <name>headings.multivalued</name> 
    <value>true</value> 
    </property> 

應該將以下內容添加到您的schema.xml文件中(使用Apache Solr時):

<!-- fields for the headings plugin --> 
<field name="h1" type="text" stored="true" indexed="true" multiValued="true"/> 
<field name="h2" type="text" stored="true" indexed="true" multiValued="true"/> 
<field name="h3" type="text" stored="true" indexed="true" multiValued="true"/>